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THE STRUCTURE OF WORK VALUES IN RELATION 
TO STATUS, ACHIEVEMENT, INTERESTS, 
AND ADJUSTMENT 
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It has frequently been suggested that values 
may be classified as intrinsic or extrinsic, as 
those which are inherent in and derived from 
the activity or object itself, or as those which 
are the outcomes or concomitants of having 
the object or participating in the activity. 
Thus E. Ginzberg, S. W. Ginsburg, Axelrad, 
and Herma (1951, p. 217) suggested, on the 
basis of general observation, that the signifi- 
cant satisfactions derived from work fall into 
three distinct, though related, types. These 
are rewards (monetary and prestige), con- 
comitants (social and environmental), 
intrinsic satisfactions (pleasure in the ac- 
tivity and in the accomplishment of specific 
ends). Darley and Hagenah (1955, | 
139, 169, 191-192) made use of the intrinsic- 
extrinsic dichotomy, borrowing heavily from 
early writing by Fryer (1931). Rosenberg 
(1957, pp. 13-16) identified three value com- 
plexes in his study of the values of coll 
students: the self-expressive (intrinsic), peo- 
ple, and extrinsic-reward 
types of values, and rel: 
tional goals and vocational choice. The writer 
(Super, 1957, pp. 299 
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re uses, and various studies cited by 
authors make the _ intrinsic-extrinsic 
classification seem plausible, the hypothesi 
that values are thus organized has not been 
adequately tested (only as this analysis was 
proposed was the O’Connor-Kinnane, 1961, 
study carried out). If the is 
psychologically as well as_ philosophically 
sound, measures of intrinsic values should be 
more highly correlated with each other than 
with measures values. If there 
are no more positive correlations within value 
categories than between value categories, then 


reports some 


support the three-way 
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A second problem connected with the meas 
urement and study of values is that of the 
identification and description of values as dis- 


tinguished from interests, needs, adjustment, 
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and other personality variables. The Allport- 
Vernon-Lindzey Study of Values, following 
Spranger’s theory, identifies six values, with- 
out relevance to the intrinsic-extrinsic di- 
chotomy. Some of these values have been 
shown to be highly correlated with interests 
such as those measured by the Strong Voca- 
tional Interest Blank (Super, 1949, pp. 423- 
423), and Brogden (1952), for example, pro- 
duced a rather different list as a result of 
factor analysis. There is thus a need for a 
factor analytic study of a number of sup- 
posed value, interest, and other personality 
variables. 


PROCEDURE 
Instruments 


The Work Values Inventory (WVI) developed by 
the writer for use in the Career Pattern Study (Super, 
Crites,; Hummel, Moser, Overstreet, & Warnath, 
1957; Super & Overstreet, 1960) is the measure of 
values used in this investigation. It consists of two 
parts, each of 105 paired comparisons, yielding scores 
for 15 presumed values. Each value is represented 
by two statements, such as “Work which benefits 
other people” and “Work in which you help other 
people,” designed in this illustrative item tap 
altruistic or social welfare values with two slightly 
differing statements selected experimentally from 
among several preliminary variations. In Part I of 
the Inventory each first-form statement is paired 
with each of the other 14 value-statements, for a 
total of 105 combinations; in Part II the same thing 
is done with the second forms and variations. Each 
value-score is the sum of the number of times that 
value is chosen over the other values with which it 
is paired, in 28 pairings out of a total of 210. The 
maximum possible score is thus 28. Early studies 
with the WVI were reported in a thesis by Hana 
(1954), in the second Career Pattern Study (CPS) 
monograph (Super & Overstreet, 1960), a study by 
O’Hara and Tiedeman (1959), and a factor analy- 
sis of a modified form by O’Connor and Kinnane 
(1961). A preliminary manual including relevant data 
from these and other lesser studies of the develop- 
ment of the WVI is currently in preparation. The 
15 values measured are shown in Table 1. 


to 


Intrinsic-extrinsic values were measured as such in 
a content analysis of interviews with the subjects of 
this study. Each boy was privately interviewed on 
four different occasions, in a semistructured inter- 
view, the outline of which appears in the appendix 
of the first CPS monograph (Super et al., 1957): 
part of the interview dealt with values sought in 
work and in life; the interviews were tape recorded 
and later transcribed. The method of interview 
analysis is described in a dissertation by Yoganara- 
simhiah (1957) and is essentially 
content analysis. 


a rating based on 
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Status was defined in intellectual, socioeconomic, 
and cultural terms. Intellectual status was measured 
by the Otis Quick-Scoring Test of Mental Ability, 
Gamma Form. Socioeconomic status was judged by 
parental occupation, using the Hamburger (1958) 
revision of the occupational rating scale which is 
one part of Warner’s (Warner, Meeker, & Eells, 
1949) Index of Status Characteristics. Cultural stimu- 
lation was measured by a scale developed by Super 
and Heyde for a Biographical Inventory (Super & 
Overstreet, 1960), based multiple-choice items 
such as number and type of magazines in the home, 
museum visits, concerts attended, designed to assess 
the richness of the cultural resources to which the 
subject has been exposed. A measure of vocational 
aspiration level was also included, which consisted 
of the vocational preference of the boy converted 
into an occupational rating by the Moser-Dubin- 
Shelsky revision of the Roe scale (Roe, 1956). 

Achievement was assessed by academic, extracur- 
ricular, community, employment, adolescent inde- 
pendence, and peer acceptance measures, so as to 
tap various important adolescent developmental 
tasks. Academic achievement was measured by 
junior high school grades in the three constant and 
universal courses of those years (Bernstein, 1953). 
Participation in school and in community activities 
was measured by summing the weights given to vari- 
ous types of participation in the respective kinds of 
activities (Hudson, 1953). Independence of work ex- 
perience was appraised by content analysis of in- 
terviews like those dealing with intrinsic-extrinsic 
values (Cohen, 1958; Super & Overstreet, 1960). 
Adolescent independence was measured by a Bio- 
graphical Inventory key like that for cultural stimu- 
lation, above. Peer acceptance was measured by a 
sociometric variation of the Who 
(Super & Overstreet, 1960). 

Interests were measured by Strong’s Vocational In- 
terest Blank, scored for all occupations: only eight 
occupational “pillar” scores (typical of the principal 
occupational groups) and the masculinity-femininity 
scale were used. The Men’s Form of Strong’s Blank 
was “translated” into 
use in this study 

Adjustment was assessed by Rotter’s Incomplete 
Sentences Test, by a similar scoring of the TAT for 
total adjustment (Super & Overstreet, 1960), by 
Henderson’s (1958) Father Identification Test, and 
by a Family Cohesiveness scale developed for the 
Biographical Inventory mentioned above (Super & 
Overstreet, 1960). 


on 
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Finally, the Vocational Maturity score derived by 
Super and Overstreet (1960) from the best CPS 
indices designed to measure this characteristic was 
used. Vocational maturity is here defined as the ade- 
quacy of the behavior of the individual in dealing 
with the vocational developmental tasks which he is 
confronting. The VM is based on a content 
analysis of our tape recorded interviews, and reflects 
concern with vocational choice, information about 


score 


and planning for the preferred occupation, and ac- 
ceptance of responsibility for choice and planning. 
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Subjects 


The subjects of this analysis are 88 
group” boys of the Career Pattern Study, described 
in detail in the Super and Overstreet (1960) mono- 
graph. They were ninth grade boys in Middletown, 
New York, in 1951-52, typical of their class of 143 
boys of Middletown ninth graders of other years 
and of ninth grade boys in other American small 
cities and large towns of which Middletown itself 
was shown to be typical. 


of the “core 


RESULTS 


Intrinsic-Rewards-Concomitants Trichotomy 


The intercorrelations of the 15 values of 
the WVI are shown in Table 1. 

The first seven values in the list are those 
which may be classified as intrinsic, the next 
four (8-11) deal with rewards, and the last 
four (12-15) concern concomitants of work. 
If the theory is sound, there should be more 
significant positive correlations in Sectors 1-7, 
Sectors 8-11, and Sectors 12-15, than in other 
sectors of the matrix. To facilitate the inspec- 
tion of the table, the correlations were di- 
chotomized as statistically significant in the 
expected direction at the .01 level (asterisk) 
or statistically nonsignificant (above .01, no 
asterisk). There were 3 positive significant 
correlations out of 37 possible in the sectors 
in which they were expected, or 8.11%, as 
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contrasted with 4 positive significant correla- 
tions out of 68 in the sectors in which they 
were not hypothesized, or 5.29%. The dif- 
ference between these two percentages is not 
statistically significant. It cannot be said, 
therefore, that the 15 values measured by the 
WVI tend to cluster in the three hypothesized 
categories. 

It may perhaps be meaningful, from a 
philosophical standpoint, to think in terms of 
intrinsic, reward, and concomitant values: the 
logic of the classification still seems good. But 
it is not helpful to do so when studying the 
structure of values in human subjects, for peo- 
ple who tend to score high on one intrinsic 
value, e.g., creativity, do not tend to score 
high on all other intrinsic values, e.g., mas- 
tery. Apparently, as the matrix shows, some 
extrinsic values do tend to go together, for 
security and economic returns are positively 
and significantly correlated, as are prestige 
and economic returns, but security and pres- 
tige are negatively correlated. None of the 
seven intrinsic significantly and 
positively interrelated, and some are actually 
negatively related 


values are 


(altruism-independence, 


achievement-management). Apparently the 


value structure of individuals cuts across 


Ginzberg’s trichotomy, so that people are best 


TABLE 1 


RELATIONSHIPS BETWEEN VALU! 


GROUPED ACCORDING TO HyYPOTHES 


“ATEGORIES 
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characterized as seeking some intrinsic values, 
certain rewards, and particular concomitant 
satisfactions. 

It is relevant to ask, then, whether some 
more meaningful structuring of values can be 
achieved. The work of O’Connor and Kinnane 
(1961) in factor analyzing the items used in 
the WVI has identified six factors which 
may be named as follows, modifying the 
O’Connor-Kinnane labels slightly: 


A. Material Success 
economic returns) 
Altruism (social, creativity, 
ment, supervisory relations) 
Conditions and Associates (surround- 
ings, associates, independence) 
Heuristic Creative (intellectual stimula- 
tion, creativity, esthetic) 

Achievement Prestige (achievement, 
prestige, independence, way of life) 
Independence, Variety (independence, 
variety, way of life) 


(prestige, security, 


manage- 


Factorial Structure of Values, Interest, Ad- 
justment, and Status 


In order to throw additional light on the 
structure and nature of values, a factor analy- 
sis was carried out of the matrix of intercor- 
relations of WVI scores and the other meas- 
ures described earlier. 

The factor analysis of instruments such as 
the Strong VIB, the WVI, and the Biographi- 
cal Inventory has been criticized by Guilford 
(1952). In the case of the SVIB and the BI, 
some items are used in more than one key. 
In the SVIB, at least, the scores are fac- 
torially complex. In the WVI, as in the Kuder, 
some scales may be univocal, so pure already 
as to fail to be locatable in the three or more 
measures necessary for the satisfactory identi- 
fication of a factor. And it is particularly im- 
portant that, in the WVI as in the Kuder 
(which Guilford uses as an illustration), the 
scores are substantially ipsative, high scores 
on some scales in a paired-comparison instru- 
ment automatically forcing other scores down 
and producing a disproportionate number of 
negative correlations. Guilford’s points are 
valid and hence are noted here, but one must 
still ask whether they should prevent one 
from using the method. Factor analyses of 


ipsative measures have frequently been car- 
ried out, with results which seem useful (e.g., 
Brogden, 1952); factor analyses of the SVIB 
have been performed with results of demon- 
strated value (Strong, 1943). In this instance, 
the use of a number of measures minimizes 
likelihood of failure to identify a factor; the 
analysis of some complex measures used by 
counselors should lead to a better understand- 
ing of much-used concepts and to the develop- 
ment of purer measures; and caution can be 
used in drawing conclusions concerning fac- 
tor structures from the analysis of ipsative 
measures. 


Values, Interests, and Personality Traits 


The 40 X 40 matrix of intercorrelations ? 
was subjected to a centroid factor analysis, 
and the axes were rotated by the Varimax 
method. Ten factors were extracted. The re- 
sults are reported in Table 2. 

Factor 1—Nonmaterial Cultural Interests. 
The first factor appears most clearly in the 
interest scales of Strong’s VIB, with heavy 
positive loadings in the biological and physi- 
cal sciences, artistic and literary occupations 
and negative loadings on the YMCA secre- 
tary and purchasing agent scales. It also has 
sizable loadings on several WVI 
though these tend to be lower: 


scales, al- 
positive load- 
ings are on creativity and intellectual stimu- 
lation, negative on economic returns, manage- 
ment, and way of life values. It 
involve nonmaterial interest of a 
intellectual variety. 

Factor 2—Cultural Utilization. The second 
factor to emerge has heavy loadings largely 
in the indices of status and achievement: cul- 
tural stimulation, vocational maturity, IQ, 
parental occupational level, grades, 


seems to 
cultural- 


adoles- 
cent independence, participation in commu- 
nity activities, and peer acceptance; it also 
has loadings on some of the measures of in- 
terpersonal adjustment, of which peer ac- 
ceptance might perhaps be considered one but 

1A 2-page table of the 4( 
deposited with the American 
tute. Order Document No 
Publications Project, 
brary of Congress; 


40 matrix has been 
Documentation Insti- 
7177 from ADI Auxiliary 
Photoduplication Service, Li- 
Washington 25, D. C., remitting 
in advance $1.25 for microfilm or $1.25 for photo- 
copies. Make checks payable to: Chief, Photodupli- 
cation Service, Library of Congress. 
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in which category family cohesiveness and 
father identification more clearly belong. 
There is a significant negative loading on 
the WVI scale of managerial values, and 
on the interview-derived index of extrinsic 
values. The factor appears to involve the uti- 
lization of environmental resources for per- 
sonal development. 

Factor 3—Manipulation of People versus 
Things or Ideas. The third factor appears pri- 
marily in the interest scales, with heavy posi- 
tive loadings on the sales and literary-legal 
keys, and heavy negative loadings on the 
farmer and masculinity keys. There are lesser 
positive interest (purchasing agent) and nega- 
tive interest (engineer) loadings; there are 
also positive loadings on the prestige values 
scale, the vocational maturity index, and par- 
ticipation in school activities; and negative 
loadings on creativity values, independence of 
work experience, and adjustment as measured 
by the Incomplete Sentences Test. The posi- 
tive loadings are all on variables which involve 
dealing with people and interpersonal situa- 
tions, the negative on variables in which 
things or ideas seem most important. 

Factor 4—Social Contact Content. 
The fourth factor is revealed primarily in 
the WVI scales: associates, supervisory rela- 
tionships, security, and surroundings, have 
positive loadings; independence, intellectual 
stimulation, esthetics, and management, have 
negative. Participation in community activi- 
tives also has some positive loading. The fac- 
tor thus seems to be one stressing social con- 
tacts more than the content of the activity. 

Factor 5—Tangibles versus Intangibles. The 
fifth factor to emerge is also brought out by 
the WVI scales, with positive loadings on eco- 
nomic returns, security, surroundings, and su- 


versus 


pervisory relationships, and negative loadings 
on altruism and prestige. The continuum ap- 
pears to be one of emphasis on the tangible, 
present, supportive as contrasted with the 
intangible, less immediate, less direct, prod- 
ucts or by-products of work 


Factor 6—Other versus Inncr-Direction. 
The sixth factor is also a values factor, has 
a substantial positive loading on the manage- 
ment scale and a lesser loading on the eco- 
nomic returns scale, with negative loadings on 
esthetics and achievement; it has also a lesser 


positive loading on intelligence. The con- 
tinuum is not easily named, but appears to 
be one of other versus inner direction, of 
managing enterprises and activities involving 
others and seeking rewards commonly sought 
by others as contrasted with valuing beauty 
and mastery, seeking self-expression in one’s 
activities. That intelligence is associated with 
other rather than with inner direction is per- 
haps surprising. It should be noted that Fac- 
tor 5, Tangibles versus Intangibles, has some 
similarity to this one; it seems to differ from 
Factor 6 in that it has no intelligence load- 
ing, and involves more passive-receptive as 
contrasted with active-creative values. 

Factor 7—Pleasure versus Task Orienta- 
tion. The seventh factor is also a values vari- 
able; it has positive loadings on variety and 
independence, negative loadings on achieve- 
ment, security, and intellectual stimulation. 
The boy’s vocational aspiration level and fa- 
ther identification also have negative loadings. 
The factor appears similar to Ginzberg’s 
pleasure-versus-task orientation, although it 
is difficult to fit father identification into 
this interpretation unless fathers are task ori- 
ented in relation to their sons. 

Factor 8—Isolation versus Social Effective- 
ness. The eighth factor is not so well defined, 
for only the VIB purchasing agent scale has 
an appreciable positive loading, although the 
loadings on adolescent independence and in- 
dependence of work experience may be worth 
noting. Negative loadings on TAT adjust- 
ment, participation in school activities, peer 
acceptance, community activities, and voca- 
tional aspiration level are noteworthy, and 
that on the YMCA secretary scale may be 
worth mentioning. The factor may be one of 
isolation or solitariness versus effec- 
tiveness, with the negative end of the con- 
tinuum more readily named than the positive. 

Factor 9—Bestowed versus Achieved Status. 
The ninth factor also is not well defined, but 
has lesser positive loadings of interview-rated 
extrinsic values and of parental occupational 
level, negative loadings of way of life, sur- 
roundings, grades, vocational aspiration level, 
and peer acceptance. The positively loaded 
variables seem to have in common satisfac- 
tions which are given or which come by virtue 
of the efforts of others or of a situation (de- 


social 
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pendency?), the negatively loaded variables 
appear to involve satisfactions for which one 
works oneself (autonomy?). 

Factor 10—Thinking Introversion versus 
Conformity. The final factor extracted is par- 
ticularly difficult to name; it has moderate 
positive loadings on creativity and _ intellec- 
tual stimulation values, lesser negative load- 
ing IST adjustment, purchasing agent, 
grades, parental occupational level, and se- 
curity. These loadings suggest that the con- 
tinuum may be one of thinking introversion 
versus conforming achievement. 


on 


DISCUSSION 


Factors which might best be classified as 
value variables appear to be Numbers 4 (So- 
cial Contact versus Content), 5 (Tangibles 
versus Intangibles), 6 (Other versus Inner 
Directed), and 7 (Task versus Pleasure) with 
Number 1 (Nonmaterial Cultural) being both 
value and interest; the only clearly interest 
variable appears to be Number 3 (People- 
Thing Manipulation); although it has some 
slight loading in the values area, we have 
seen that Number 1 may also be put in both 
categories; personality and adjustment vari- 
ables tapped appear to be Factors 8 (Isola- 
tion versus Social Effectiveness), 9 (Bestowed 
versus Achieved Status), and 10 (Thinking 
Introversion versus Conformity) ; one achieve- 
ment variable Factor 2 (Cultural 
Utilization). 

Of particular interest in the study of values 
is the fact that the 15 values scores of the 
WVI appear to yield four factors not revealed 
by the interest or adjustment measures, and 
two values which may equally well or better 
be called interests, for a total of six apparent 
value factors; the eight interest 
scales yield only one or at best two factors 
which are not better assigned to the values 
area; and both interests and values appear 
quite distinguishable from personality traits 
and adjustment variables. 

This factor analysis of the 15 scales of the 
Work Values Inventory and of a number of 
other instruments identified six value factors, 
the same number as that which was found by 
O’Connor and Kinnane (1961) in analyzing 
only the items used in this inventory, but the 
factors do not appear to be identical. This is 


appears, 


vocational 


E. 


SUPER 


to be expected as the earlier analysis involved 
only WVI items, administered in a different 
manner, whereas this study has analyzed a 
much more comprehensive matrix, in which 
a greater variety of correlations might bring 
out a differing factor structure. The O’Con- 
nor-Kinnane Factors A, D, E, and F appear 
similar to the present Factors 6, 5, 1, and 7, 
respectively, while B and C bear some re- 
semblance and 4, but not to 
justify pairing them. 


to 3 enough 
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ANGULAR ESTIMATION ’ 


SIDNEY L. SMITH 


MITRE Corporation 


In one study, 10 Ss estimated the directional trend (heading) of simulated 
radar trails, using different response modes; rotary switch adjustment per- 
mitted better accuracy than numerical estimation. Varying the displayed 
length of the simulated trails from ys to 14 inches had no apparent effect 
on estimation accuracy. 5 civilian Ss proved more accurate than 5 airmen. 
In a 2nd study, 20 Ss estimated the angular position of lines varying in length 
from & to 1 inch, using equipment which permitted switch adjustment and 
numerical estimation only to the nearest 10 degrees. Results were the same as 
before. In addition, this report notes differences in estimation accuracy and 
bias related to the actual angle of displayed lines over a 360-degree range, 


as well as biasing effects of right- vs. left-handed switch adjustment. 


In man-machine systems one of the func- 
tions frequently allotted to man is the esti- 
mation of directional or angular relationships. 
Human ability to make such estimates is the 
basis of our motor skills—for a relatively sim- 
ple response sequence such as picking up a 
pencil just as for considerably more complex 
activities such as piloting an aircraft, percep- 
tion of directional relations is continually re- 
quired of us. This paper reports the results 
of two experimental studies exploring human 
ability to estimate angular relations. In one 
case, subjects were asked to report the direc- 
tional trend (“heading’’) of a series of radar 
returns on a simulated display. In the other, 
they judged the angular position of displayed 
vectors, or straight lines. 

For a human operator performing an air 
surveillance function, viewing raw radar re- 
turns on a PPI scope or filtered data on some 
sort of situation display, equipment facilities 
are generally adequate and the perception of 
simple two-dimensional directional relations 
is comparatively easy. Whether he must note 
a particular position, or must judge the bear- 
ing of one point with respect to another, or 
must estimate the heading (projected direc- 
tion) of a series of returns, the stimulus in- 
formation he requires is available on the dis- 
play. In such a situation, however, difficulties 

1The research reported in this article sup- 
ported by the Department of the Air Force under 
Air Force Contract AF-33(600)3 A more 
tailed account of this research was published as a 
MITRE Technical Series Report, MTS-6, “Heading 
Estimation by the Human Operator,” March 1962. 
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may arise when the operator must respond, 
or report his observations. The means pro- 
vided him to report directional relations may 
influence the ease and accuracy with which he 
can do this job. 

In the case of positional designation, our 
common experience assures us that pointing, 
when possible, is preferable to verbal descrip- 
tion. Happily, this compelling conviction is 
confirmed by published studies, as for exam- 
ple one by Reed and Bartlett (1947). As for 
estimation of direction, our experience sug- 
gests that here also the equivalent of pointing 
would be preferable to verbalization. The two 
studies described in this report confirm this 
view. 

EXPERIMENT I 
Procedure 


Ten subjects were asked to make heading esti- 
mates for a series of 60 simulated radar trails. Each 
man went through this series four times (with the 
simulated trails presented in a different random order 
each time) using four different response modes for 
indicating his estimate. The subjects were instructed 
to make their estimates as accurately and quickly as 
possible. Five of the subjects were airmen, with op- 
erational experience in the use of an eight-position 
detented rotary switch for reporting heading esti- 
mates. The other five subjects were civilian col- 
leagues at MITRE with no specific experience in 
making such heading estimates. 

The simulated radar trails consisted of 
black dots inked on white paper. The latest “return,” 
which had to be distinctive to permit heading esti 
mation, was indicated by a larger dot. The simulated 
trails were displayed in three different lengths to de- 
termine the extent which this factor influences 
heading estimation. The lengths used were inches, 
1 inch, and 14 inches as measured on the display sur- 


strings of 


to 


14 
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face, with 20 trails displayed of each length. The 
choice of lengths was related to a particular design 
application in the Air Force SAGE (Semi-Automatic 
Ground Environment) System and has no theoreti- 
cal basis. Subjects’ viewing distance 
surface was approximately 18 


to the display 
inches 

Each subject estimated headings for the series of 
60 simulated trails four times, using four different re- 
a total of 240 es Two of the 

involved adjustment of a rotary 
switch, mounted in one case to the right of the dis- 
played tracks for right-handed use (all subjects were 
right-handed) and in the other case to the left for 
left-handed adjustment. This switch had a 
circular knob, 2+ inches in diameter and 1 inch thick, 
with a white arrow painted across its diameter 
knob was continuously adjustable, and no reference 
lines were provided on its mounting base. The sub- 
ject was simply asked to turn the knob so that the 
arrow pointed in the same direction as 
radar trail. 

In the other two re¢ 
asked to give numerical heading estimates from 1 to 
60, to the nearest degree if they could. The orienta 
tion used was that a trail heading directly up would 
be called “360,” to the right was ’ directly down 
In one response condition, the sub- 
estimates with no external reference 
In the other, subjects were encouraged to refer to a 
imple azimuth circle (8-inch diameter with 1-degre« 
intervals and 10-degree labeling) placed on the table 
before them. 


ponse modes imates 


response mod 


the di pl iyved 


ponse modes, the subject 


was “180,” etc 


jects made thes« 


Each subject used the response modes in a different 
t 


effects. The ex- 
perimenter recorded the error for each ¢ 
and the total time required. The 
no indication of the accuracy of thei: 
any time during the experiment 


sequence to balance possible order 


stimate made, 
ubjects were given 
estimates at 


RESULTS AND DISCUSSION 


There were no consistent differences in esti- 
mation accuracy for tracks of different dis- 
played length. As a group, the civilian sub- 
jects were more accurate in estimating the 
headings than the airmen. 
Both subject groups were more accurate using 
rotary switch adjustment than when they 
were required to make numerical estimates of 
heading. A summary of average estimation 
error is presented in Table 1. 


displayed were 


Analysis of variance of estimation errors 
was undertaken along the lines recommended 
by Edwards (1950) for data involving re- 
peated measurements from the same subjects. 
This variance analysis was based on the sums 
of each subject’s 20 estimation errors under 
each combination of response mode and dis- 
play length. Subject groups differed signifi- 


rABLE 1 


HEADING EsTIMATION Errors (in 


First EXPERIMENT 


Length of displayed 
radar trail 


¢ 


Subject 


\irme i 
witch adjustment 
Right-handed 7.6 6.7 
Left-handed 7 6.6 9.6 
Numerical estimation 
Unaided 9.4 11.2 
Aided 8.3 11.8 
Civilian 
Switch adjustment 
Right-handed 
Left-handed 
Numerical estimation 


> 


cantly at p < .001 (F 1/8). Re- 
sponse modes were significantly different at 
the same level (F 14.4, df 3/105). Dis- 
play length had no significant effect, nor did 
any interaction term. 

Duncan’s range test (1955) was applied to 
the differences among response modes. There 
was no difference between right- 
and left-handed switch adjustment, nor be- 
tween aided and unaided 
tion. However, both methods of 
justment were superior to both 
numerical estimation at the .01 
nificance. 

In terms of speed, the average time re- 
quired per estimate seemed more than any- 
thing else to be characteristic of each par- 
ticular subject, from one individual to another 
ranging from 3.6 to 7.0 Because of 
this variability, no difference between subject 
groups was demonstrated. An analysis of vari 
ance based on speed of response, comparable 
to that already described, indicated that the 
statistically differences 
among response modes. The range test con- 
firms at the .01 level that unaided numerical 
estimation was quicker (4.7 seconds) than 


19.5, df 


significant 


estima- 
switch ad- 
modes of 


numerical 


level of sig- 


seconds. 


only reliable 


were 
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either right- or left-handed switch adjustment 
(5.6 and 5.4 seconds). Aided numerical esti- 
mates averaged 5.2 seconds. 

The results with regard to estimation error 
deserve amplification. For certain practical 
applications something further must be known 
about the expected distribution of operator 
heading estimation errors than simply the 
mean. Figure 1 presents the cumulative fre- 
quency curves for errors of increasing size for 
all subjects. Since the data analysis did not 
reveal any reliable differences in accuracy be- 
tween right- and left-handed switch adjust- 
ment, or between aided and unaided numeri- 
cal estimation, all adjustment data are plotted 
as one curve and numerical estimation data 
as another. The greater accuracy of switch 
adjustment is illustrated: 58% of the switch 
adjustments, for example, were within 5 de- 
grees of the displayed heading, whereas only 
48% of numerical estimates were this accu- 
rate. 

Previous investigators (Chapanis, 1951; 
Hunt & Warrick, 1957) have noted differ- 
ences between right- and left-handed adjust- 
ment responses. The present data also show 
such differences, not in terms of average size 
of error but in terms of direction or bias of 
error. There was a tendency for right-handed 
adjustment to produce relatively fewer clock- 
wise errors (36%) than left-handed adjust- 
ment (65%). Chi square analysis of the di- 


100 


be) 


MAXIMUM ZE OF ERROR EGREES) 


;. 1. Cumulative error distribution in 
first experiment. 


rectional frequency of errors confirmed this 
difference at the .001 level. 

Another factor of interest is that, although 
encouraged to to the nearest de- 
the subjects actually as- 
sumed self-imposed response limitations: 90% 
of their numerical responses were multiples of 
5, ending either in 5 or O. Switch settings 
were, of course, much more evenly distributed. 
A similar finding of response quantizing is re- 
ported by Chapanis (1951). 

The failure to discover any consistent influ- 
ence of displayed length of the simulated 
radar trails on heading estimation accuracy is 
surprising. We might expect that longer trails 
would provide more adequate stimulus cues 
to direction, and hence more accurate esti- 
mation. It is certainly clear that for extremely 
short trails heading estimation would dete- 
riorate, and in the limiting case of a single 
displayed point it would become impossible. 
The fact that such an effect was not demon- 
strated over a range of displayed lengths re- 
ducing from 14 to ;; inches suggests that 
even shorter tracks may constitute an ade- 
quate stimulus for heading estimation. This 


“estimate 
gree if you can,” 


possibility was investigated in the second 
study described in this report. 


EXPERIMENT II 

Procedure 
The experimental design of the second study dif 
fered from the first in four respects: more 
run; they to make quantized 
heading estimates using two respons 


subjects 
were were required 
modes, switch 
adjustment and numerical estimation; and the dis 
plays were designed to permit the use of 
tracks, and to ensure a more equal sampling of 
heading direction over the 360-degree range than ob- 
tained in the first study. 

Twenty men were used, 10 airmen and 10 civilians. 
In each subject group, three of the men were “ex- 
perienced” subjects from the earlier heading estima- 
tion experiment 

The displays used consisted of bright vectors 
(straight lines) rear-projected on a dark screen, so 
as to measure inch in width, and in four lengths 
1, 4, 4, and & inches. These vectors provided an un 
ambiguous indication of heading even for very short 
display lengths. The tail (back end) of each vector 
was indicated by a 


shorter 


dot, offset to one side so as 
not to represent a visual continuation of the vector 
itself 

Each subject viewed in random sequence 36 vec 
tors for each of the four display lengths used, using 
ich of the two response modes, making a total of 
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288 heading estimates. The particular headings chosen 
for display were selected so that each 10-degree in- 
terval of possible azimuth directions (001 to 010 de- 
grees, 011 to 020, etc.) was represented by one vec- 
tor of each of the four display lengths used. 

In this second study, the rotary switch was not 
continuously adjustable. Instead, it was detented at 
36 positions corresponding to the 10-degree azimuth 
intervals. Similarly, the subjects were not permitted 
to make numerical estimates guessing to the nearest 
degree. Instead, they had to report their estimates on 
push-button modules which provided only 10-degree 
accuracy. This equipment-constrained response quan- 
tizing meant that a subject could not estimate cor- 
rectly in every case, even as 
tion. The best 


a theoretical proposi- 
he could do was to minimize error 
size by making an optimal switch setting, 
nearest possible numerical estimate. For a 
displayed at an angle of 107 degrees, 
optimum estimate would be 11 
best estimate would be 100 


or the 
vector 
for example, an 


degrees, a second 


degrees, and so on. 


RESULTS AND DISCUSSION 


The civilian group was again more accu- 
rate than the airmen. Both groups were more 
accurate estimating heading by switch adjust- 
ment than when they made numerical esti- 
mates reported by button pushing. They were 
also faster using switch adjustment. Both the 
number and kinds of estimation errors were 
consistently related to the particular heading 
direction displayed: fewer errors were made 
for headings approximating the direction of 
the four cardinal points (360°, 90°, 180°, 
and 270°); those that were made 
tended to emphasize the perceived discrep- 
ancy between the actual heading and these 
implicit vertical and horizontal references. 
Ambiguous differences were noted relating to 
the effect of displayed vector length on esti- 
mation accuracy. 


errors 


The average estimation error for each sub- 
ject group using the two different response 
modes is summarized in Table 2. Since these 
results represent all estimates made in this 
study, it is assumed that average error size is 
a legitimate comparative 
the enforced quantizing 


measure in spite of 
error for single re- 
sponses. 

Variance analysis along the same lines as 
described before, confirmed at the .001 level 
the differences noted in the first study: the 
civilians were more accurate as a group than 


the airmen (F = 36.4, df = 1/18) and switch 


adjustment was the more accurate response 


TABLE 2 


HEADING ESTIMATION Errors (1 


SECOND EXPERIMENT 


Length of displa ved 
vector 


Subjects a ” 4” 


Airmen 
Switch adjustment 
Numerical estimation 
Civilian 
Switcl adjustment 
Numerical estimation 


mode (F = 36.9, df 1/136). It should be 
mentioned that this confirmation does not 
simply reflect the fact that some of the same 
subjects were used. Six men, it is true, did 
participate in both studies. However, the data 
from the 14 new subjects show the same dif- 
ferences, in kind and degree, between subject 
groups and between response modes. The chief 
point of interest is that the superiority of 
switch adjustment held up even when the con- 
straints of response quantizing were mace 
identical with those for numerical estimation. 

The average estimation error for those sub- 
jects who participated in both studies in- 
creased by about a degree for both switch ad- 
justment and numerical estimation, under the 
conditions of the second study. The difference 
is a small one, and we may conclude that 
10-degree quantizing is almost if not quite 
equivalent to a continuous response capability 
in terms of expected average estimation error 
This is confirmed by the plot of cumulative 
error distribution presented in Figure 2, which 
varies surprisingly little from the uncon- 
strained response conditions of the first study. 

In terms of response time for heading esti- 
mation, the switch adjustment mode which 
was slower in the first study turned out to be 
significantly faster in the second, averaging 
4.1 seconds as compared with 5.5 seconds for 
numerical estimation. Presumably this reflects 
the fact that precise adjustments were not 
possible in the second study, and so the sub- 
jects could make a quicker, more casual se- 
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MAXIMUM SIZE OF ERK 
Fic. 2. Cumulative error distribution in 
second experiment. 


lection among switch settings. Numerical esti- 
mates, on the other hand, had to be reported 
by button insertion rather than simply stated 
verbally, which would increase response time 
somewhat. 

The picture with regard to the effect of dis- 
played vector length is more confusing. The 
average estimation error, for all subjects using 
both response modes, is as follows: 


Vector length 
inches 


Average error 
(degree S 
10.1 


8.0 


1 
5 
1 
‘ 
8.3 
1 


92 


error at- 
tributable to display lengths were indicated 
by the variance analysis (F = 3.97, df = 3/ 
136). Application of the range test to the dif- 


Marginally significant effects on 


ferences in estimation accuracy associated 
with different displayed vector lengths re- 
sulted in a somewhat concl 


accuracy for }-inch vectors was less than for 


confusing usion: 
|-inch vectors (at the .01 level) and possibly 
less than for $-inch vectors (.05 level), but 
not significantly different than for 1l-inch vec- 
tors. It is true, and 
the shortest vectors, which provided the 


reassuring that 
least 
adequate angular cues, resulted in the great- 
However, it 
would be more encouraging if the accuracy 


somewhat 


est average estimation error. 


differences between j}-inch and still longer 
vectors than }-inch were also statistically re- 
liable. This was not the case. In this connec- 
tion, a possible alternative approach to the 
data, a chi square analysis based on observed 
frequency of optimal versus nonoptimal esti- 
mates, confirmed statistically reliable differ- 
ences between subject groups, and between 
response modes, but none related to display 
length. The only clear conclusion from these 
present studies is that the adequate cue for 
angular perception is surprisingly short. 

Data from the first study demonstrated a 
small error bias associated with right- versus 
left-handed switch adjustment. This seems to 
have been confirmed in the present context, 
in spite of the 10-degree detenting of the 
rotary switch. For those subjects making 
right-handed switch adjustments, 
their errors were clockwise, whereas subjects 
working left-handed made 52% clockwise 
errors. Chi square analysis confirms this dif- 
ference (x7 = 85, p< .001). There was no 
difference in error bias indicated by a corre- 
sponding analysis of right- and left-handed 
button insertion (,* = .07), which tends to 
rule out the alternative hypothesis that the 
particular subjects involved in this compari- 
son had consistent perceptual biases. 

A summary of errors as related to displayed 
direction is presented in Figure 3. 
errors were made in estimations of headings 
in directions approximating the cardinal azi 
muth points, 90, 180, 270, and 360 degrees 
which represent implicit vertical and _ hori 
zontal references. This effect seems to have 
been somewhat more pronounced for numeri 
cal estimation than for switch adjustment. In 
the case of switch adjustment, there seems to 
be both an improvement in accuracy at thi 
cardinal points and for heading directions ap 
proximating quadrant bisections, near 45, 135 
225, and 315 degrees. 

The directional bias of errors as related to 
displayed heading 
A trend seems to 
wise errors for headings somewhat clockwise 
of the cardinal 
wise errors for headings displaced somewhat 


35% of 


Fewer 


is presented in Figure 4 


be apparent toward clocl 
directions, and counterclock- 
in a counterclockwise direction from the hori 
zontal or vertical. In brief, the subjects wh« 


1 


they made errors seemed to emphasize th 
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DISPLAYED HEAT 


Fic. 3. Estimation errors as related to displayed heading in second experiment 


perceived disparity between a displayed head- 
ing approximating a horizontal or vertical di- 
rection and the implicit reference itself. 

As it happens, the tendency toward re- 
sponse bias is greatest where the tendency 


toward error is least; i.e., for headings ap- 
proximating vertical or horizontal directions 
as displayed. Thus, for practical purposes we 


100 


WERE CLOCKWISE 


AT 
- OPTIMAL ESTIMATES) 


HA 


ORS 


(NON 





might well be able to ignore such effects al- 
together. Moreover, these data, after all, are 
in some degree inferential. Because of the re- 
sponse quantizing feature of this study, we 
must rely on error frequency data for illus- 
trating these differences. It is clear that an 
experimenter who is interested in an effec- 
tive exploration of these phenomena should 
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180 


DISPLAYED HEADINGS (COMBINED IN TEN-DEGREE INTERVALS) 


Fic. 4. Directional trend of estimation errors in second experiment. 
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permit his subjects to make continuous 
manual adjustments, and present a selection 
of displayed headings finely distributed over 
the 360-degree range. Since this does not de- 
scribe the conditions of the present study, it 
was concluded that no statistical analysis of 
the present results was appropriate. How- 
ever, the present results are suggestive. 
Perhaps the most relevant previous work 
on this question is described in a series of re- 
search reports published by the Mount Holy- 
oke College Psychophysical Research Unit 
(Kaufman, Reese, Volkmann, & Rogers, 
1947; Reese, Volkmann, Rogers, & Kauf- 
man, 1948; Rogers, Volkmann, Reese, & 
Kaufman, 1947). These dealt with estima- 
tion of bearings over the range from 350 de- 
grees clockwise to 100 degrees. Because of 
differences in experimental procedure and 
their limited range of stimuli, it is difficult to 
draw comparisons with the present data. They 
did note a similar “anchoring” effect of error 


reduction in the vicinity of 360 and 90 de- 
grees. However, there is no apparent con- 
firmation in their data of the “sharpening” 
effect noted here, the exaggeration of per- 
ceived discrepancies from vertical and _ hori- 


zontal references. 
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UNION ATTITUDE AND JOB SATISFACTION 
IN INDIAN WORKERS’ 


DURGANAND SINHA? ann KESHAB C. SARMA 


Allahabad University 


Relation hip between attitude t« 
on a sample of 100 workers in a 
Oo pecially 


association between the 2 


marital status, and length of union membl« 


job satisfaction (p< .01). None of 


union attitude significantly. 


Organization of workers into an union is 
an accepted and prevalent pattern of the con- 
temporary industrial setup. It is formed to 
enable workers to bargain effectively and as 
a protective weapon against the management. 
It represents the prime medium for substitute 
satisfaction for employee needs and demands 
which are frustrated by the job conditions 
under which people work in a modern in- 
dustry. Apart reasons, the 
union provides the individual with a sense of 
participation in dealing with vital issues, a 
channel for expression, social ties and group 
relations, and with an opportunity to achieve 
positions of leadership and authority usually 
denied on the job (Golden & Ruttenberg, 
1942; Krech & Crutchfield, 1942). As sug- 
gested by Walker and Guest (1952) the union 
serves to counterbalance a lack of personal 


from economic 


satisfaction with immediate work experience 
and meets in part the psychological and social 
needs which the work in the plant has created 
It represents an emotional as well as an 
economic dimension in the worker’s attitudes, 
and a kind of psychological bulwark against 
pace, boredom, bigness, and impersonality of 
management. 

As far as working life is concerned, the 
union may be regarded as the vital 
group to which an average worker belongs. 
Favorable attitude towards it would imply 
greater identification with union activities and 


nost 


1 The study was conducted at 1 ndian Institute 
of Technology, Kharagpur, India 
2Formerly at Indian 
Kharagpur, India 
>A graduate student, Indian 


Kharagpur, 1959-¢ 


Technology, 


Tech- 


Institute of 


nology 9 


MW irds 


iles. There was a significant 
measures (1 


these 


union and job satisfaction was studied 
light 


constructed interview sched 


engineering factory in India by use 
negative 
personal factors 
significantly 
however, 


47) the 
hip were 


age, 
related to 


were, found to influence 


and standards of 
unfavorable attitude 
would mean that it plays a less important role 
in influencing his perception and behavior. 
Quite often the worker being dissatisfied with 
job conditions looks to the union for ful- 
filling his needs. Once he joins it and develops 
certain identifications it is likely to affect 
considerably the way he would view the job 
and the conditions surrounding it. Therefore, 
his reactions to his job, his general adjust- 
ment and group relations outside the job—in 
short, his overall job satisfaction—is likely to 


acceptance of its 
judgment, 


norms 
while an 


be associated with the attitude he possesses 
towards his union. The aim of the present in- 
vestigation was to study the nature and ex- 
tent of relationship between union attitude 
and job satisfaction. Some light has also been 


ist on certain personal factors influencing 


job satisfaction and attitude towards the 


union. 

MeETHOD 
light engineering 
a good reputation 
expansion of the 


The study was conducted in a 
around Calcutta. It has 

an employer. Due to recent 
plant, the wages were high and work conditions 
were Unlike many other industries in the 
country, only one union was thriving in the organiza- 
tion. Nearly 97% of the workers belonged to the 
union which was affiliated to the All India Trade 
Union Congress. It was a left-wing union and 
had the recognition of the management. 

A sample of 1 workers were randomly selected 
and with the cooperation of the labor officer and 
the sectional head, interviews in individual sessions 


lactory 


good 


were arranged on the shop floor. Two questionnaires, 
one to assess the attitude towards the union and 
the other job satisfaction, were constructed. In the 
[ there were 18 statements mostly on the 


former, 
lines of Rosen and Rosen (1955) and Strauss and 
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Sayles (1952). Items dealing with union manage- 
ment of social organizations and institutions and 
the inside leadership were left out as not applicable 
to Indian conditions between 0-18 were 
possible, high scores being indicative of favorable 
attitude towards union. As assessed by ‘the split-half 
technique the questionnaire was a reliable one 
(r = 82). 

The job satisfaction questionnaire had 20 state- 
ments covering the areas mentioned by Blum (1949) 
and Sinha (1958). Possible scores were 0-20, higher 
ones being indicative of greater degree of job 
satisfaction. Its reliability was .80 by the split-half 
method. 

Due to the difficulty of literacy, these question- 
naires were used as the basis of interview conducted 
in the local language by the junior investigator. All 
the statements were memorized and care was taken 
to cover all the points in the course of the inter- 
view. It had the flow of an informal talk often 
lasting for about an hour. 


Scores 


RESULTS 
The summary of scores on the union at- 
titude and job satisfaction questionnaires is 
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TABLE 1 


SUMMARY OF SCORES ON THE JOB SATISFACTION AND 


UNION-ATTITUDE QUESTIONNAIRES 
N VM 


Questionnaires Median 


13.09 
9.64 


12.41 
8.97 


Job satisfaction 100 
Union attitude 100 


given in Table 1. The job satisfaction scores 
were skewed towards the favorable side show- 
ing that the workers on the whole possessed 
favorable attitudes. Thirty-six percent of the 
sample scored between 12-14 and more than 
one-third between 15-17. The distribution on 
the union attitude questionnaire was slightly 
skewed at the lower end. About one-third of 
the sample had scored between 6-8 indicating 
that they were not very favorably inclined 
towards the union. 


TABLE 2 


ComPosITE CHI SQUARE 


TABLES FOR INFLUENCE OF 


PERSONAL FACTORS oO? 


Jos SATISFACTION 
(N = 100) 


Personal factors Groups 


Below 35 years 


Above 35 years 


Illiterate 
Below matric 
Matric 


Education 


and above 


Job satisfaction s 


Not 
satis 
fied ferent fied 


Indif Satis 


(5-10) (11-14) (15-18 


19 33 


0.39% 
20 7 10.3 
9 
21 


9 


Lower ($21.00-53.00 11 


Monthly income 


Single 
Marital status as 

Married 

Small (1-6 


Number of dependents Medium (7-9 


Higher (above $53.00 28 


Large (10 and more 


Below 8 years 


Length of union membership ; 
; Above 5 years 
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TABLE 3 


FLUENCI 


(N 


Personal factors 


Education 


Monthly income 


Marital status 


The relationship between the two scores 
was studied by calculating the product-mo- 


ment correlation. It was found to be 47 
which was significant at the 1% level. It in- 
dicated that a favorable attitude towards the 
union had a significant 
tion with job satisfaction. In other words, 
workers more attitude 
towards the union were likely to be 
satisfied with their job. 


negative associa- 


having a favorable 
less 
According to the score on job satisfaction, the 
sample was divided into “not satisfied,” “‘in- 
different,” and “satisfied” groups. The chi 
square test was used to assess the influence of 
various personal factors on job satisfaction 
scores (Table 2). Of the six factors, viz., age, 
education, income, marital status, number of 
dependents, and length of union membership, 
x* was significant only for age (,? 10.39: 
01), status (x 49.16; p 
.01) and length of union membership (x 
12.30; p= .01). Thus, job satisfaction was 
upon the whether 
ried or not, and their length of 


marital 


dependent workers’ 


age, 


they were ma 


OF PER 
100 


association with the union. Among younger 
workers over 49% were satisfied and over 
28% in the indifferent group as against 21 
and 61%, respectively, of the older workers 
Of the workers who unmarried, 
55% were satisfied and only 13.8% dis- 
satisfied, while among the married, only 
33.8% were satisfied and 24% dissatisfied. 
Among workers with shorter membership in 
the union, the percentage of satisfied 
was high (54%) while among those who had 
been union members for over 8 years, it was 
only 24%. Education, income, and number 
of dependents were not found to exert any 
significant influence on the job satisfaction 
score. 


were over 


ones 


The chi square test was also used to assess 
the influence of various personal factors on 
union attitude. None have in- 
fluenced the workers’ the union 
attitude questionnaire significantly (Table 3). 
Thus, the knowledge of certain personal fac- 
tors enabled prediction of job satisfaction. 


seemed to 


score on 


The same was not possible for union attitude. 
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DISCUSSION 


The main finding of the present investiga- 
tion is the significant negative relationship be- 
tween job satisfaction and union attitude. A 
worker possessing favorable attitude towards 
the union tended to be dissatisfied with the 
job. Stagner, Flebbe, and Wood (1952) in 
their study on railroad workers bring out the 
role of the union as a source of satisfaction 
on the job. It was found that items pertain- 
ing to union-management relationship was 
most discriminating between 50 satisfied and 
50 dissatisfied workers. Other studies have re- 
vealed that workers satisfied with the job and 
work conditions do not like to join unions. 
The main reasons for joining were usually a 
demand for higher wages, better working con 
ditions, and other benefits. Being dissatisfied 
with their job conditions, the workers looked 
to the union for fulfilling these and 
needs. 


other 


Its implication on the present role of the 
union in labor-management relationship is 
obvious. One of the chief reasons for the un- 
ionism has been that it has served as a chan- 
nel for the expression of workers’ needs and 
demands and a vehicle of their dissatisfac- 
tion. There is a polarization between the man- 
agement and the union and the accepted norm 
of the union is usually one of hostility and 
dissatisfied attitudes. Therefore, workers pos- 
sessing a favorable attitude towards the un- 
ion indicated a degree of identification with 
its norms and viewed the job situation in an 
unfavorable light. 

The sample as a whole showed higher job 
satisfaction than is usually observed in India. 
The organization where the study was con- 
ducted happe ns to be one of the well-managed 
modern factories with good facilities for train- 
ing, a successful incentive scheme, and work 
ing conditions superior to other concerns in 
the area. Due to rapid modernization and ex- 
pansion in the last 12 years and a well-de- 
signed training scheme, the average worker 
skill, and 


earnings have gone up considerably. 


has advanced in status and their 
Their 
earnings are much higher than in other fac- 
tories in the The pet 


India is ap- 


same neighborhood. 


capita per annum income in 


proximately 60 dollars whereas the average 


monthly income of the sample studied was 
roughly 58 dollars. These factors together ac- 
counted for the generally high job satisfac- 
tion scores of the workers. 

With regard to job satisfaction, only three 
of the personal factors studied were signifi- 
cantly related, viz., age, marital status, and 


length of union membership. Hoppock (1935) 


had observed older workers as showing rela- 
tively higher level of job satisfaction. In the 
present investigation, the opposite tendency 
was noticed. Married workers were also a 
little dissatisfied. Increased financial strain 
with age and matrimony seems a likely rea- 
son for their But 
resulting increased domestic re- 
and burden in maintaining a 
family cannot be regarded as the sole condi- 
tion. Otherwise significant negative relation- 
ship with the number of dependents should 
also have been observed. Certain factors other 


dissatisfaction. economic 


strain from 


sponsibility 


than purely financial were obviously operat- 
ing. 

Length of union membership is negatively 
associated with job satisfaction. Longer asso- 
Ciation meant greater acceptance of the un- 
ion’s norm in viewing the job and feeling dis- 
contented One of the reasons for 
joining the union at least in the present con- 
text is the dissatisfaction and hostility to- 
wards the management. largely 
concerned with extraction of a few privileges 
and advantages rather than acting as a part- 
ner in a common endeavor. Being members 
in the union for time, the workers 
tended to adopt the “hostile role’ and the 
dissatisfaction towards the work 


with it. 


Unions are 


a long 


norm of 
situation. 

Longer membership in the union was also 
expected to create greater identification and 
a more favorable attitude towards it. But it 
did not seem to be Knowing the 
length of association with the union, while it 
engendered dissatisfaction, it did not 
create a significantly favorable attitude to- 
wards itself. It is often the case that unions 
are largely concerned with questions of wages 


the case. 


some 


and come to the fore usually in times of labor 
trouble and disputes. They seem to thrive on 
discontent and dissatisfaction. As such, in 
times of union-management harmony and in- 
dustrial peace, workers have nothing to sus- 
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tain their interest. Moreover, unions are still 
not strong enough to bargain effectively with 
the powerful management and bring all the 
benefits to the workers which they are ex- 
pected to do. Hence, a little disillusionment 
with the union is natural and lies at the 
root of their not identifying themselves more 
closely with the unions. As the UNESCO re- 
port on Southeast Asia (1956) puts it: 


Trade unions still, in many cases, are organizations 


to start a strike, less often strong enough to main- 
tain it, and very seldom able to interest the workers 
in periods of industrial peace. 


Another reason may be the consciousness on 
the part of the workers about the present in- 
adequacies of the unions in India and only a 
partial satisfaction with its role in industrial 
life. Most unions are affiliated with some all- 
India bodies having definite political leanings. 
As a result, a highly centralized organization 
is developing with remote control from the 
central bosses over local unions. Further, be- 
cause of their political tinge, unions often get 
embroiled with local politics sometimes ignor- 
ing the interests of the workers. These are 
certainly factors governing the workers atti- 


tude towards the union. In any case, the ques- 

tion needs a more careful investigation. 
Lastly, a strange but interesting result was 

that although union attitude and job satis- 


faction were significantly related to each 
other, and some personal factors could per- 
mit some prediction of job satisfaction, they 
did not permit any prediction of union atti- 


tude. It indicates that the role of unions in 
the life of the average Indian worker is not 
yet fully crystallized so that a preparation of 
an “index of union predisposition” based on 
the knowledge of his background is not pos 
sible. A more intensive analysis of personality 
and internal dynamics of workers in relation 
to their attitude towards the union is called 
for. 
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FAKING “CHANCE” ON THE 


“HARD R 


STEPHENSON 


University of Iowa 


Are scores that fall within the so-called “chance” areas of certain occupational 
scales of the Strong Vocational Interest Blank for Men (SVIB) “easily ob- 


tainable by chance”? 


To answer this question, Ss were selected whose scores, 
] 


under standard testing conditions, were either higher than chance, lower than 


chance, or in the chance 


area itself. These Ss 


were then instructed to “fake” 


directionally (in the direction of the chance area) and to “fake chance.” The 
results indicated that Ss who can fake directionally cannot fake chance, even 


when the chance range is 


in the same direction as the one they have faked. It 


was concluded that rather than ignore scores within the chance area, it may be 


better to ignore the chance ar¢ 


There is no question but that the Strong 
Vocational Interest Blank for Men (SVIB) is 
fakable. Individual subjects can fake scores 
up or down (Longstaff, 1948), can fake in- 
dividual occupational scales (Garry, 1953), 
or groups of occupational scales (Longstaff, 
1948), or group occupational scales (Gehman, 
1957). Further, there is evidence that it is im- 
possible to completely and accurately detect 
such faking (Gray, 1959). 

The evidence is also quite clear that there 
are individual differences in the ability to dis- 
simulate scores on the SVIB (Benton & Korn- 
hauser, 1948; Garry, 1953). However, these 
differences appear to be unrelated to psycho- 
logical sophistication (Bordin, 1943; Geh- 
man, 1957; Longstaff, 1948), to amount of 
information about the occupation-to-be-faked 
(Garry, 1953), or to intelligence (Garry, 
1953). The evidence favors the assumption 
that some scales are easier to fake than are 
others (Garry, 1953), at least for any given 
group, but is not clear as to whether or not 
there is a sex difference in ability to fake; 
Garry (1953) found no sex difference while 
Longstaff (1948) found women to be less suc- 
cessful at faking than men. 

There is no evidence concerning dissimula- 
tion of so-called “chance” scores on the SVIB. 
This lack of evidence is disturbing in view 
of the evidence (Stephenson, 1961) that the 
modal “score” of an individual over all of the 
occupational scales of the SVIB is a score 
in the chance area (chance mean, plus and 
minus one standard deviation). Relevant here 
are Strong’s (1943) observations that a “‘con 


is themselves 


sideration in interpreting interest scores is the 
possibility that a score may be obtained by 
chance” (p. 86) and, by inference, if a score 
is within the chance area, it is likely “that 
{such a score] is the resultant of chance” (p. 
87), and, finally, that such a score is “easily 
obtained by chance” (p. 87). The relevance 
of these observations is in the interpretation 
of results on the SVIB, where the dictum 
(Strong, 1952) is that: “Scores falling within 
the shaded [chance] area are indeterminate: 
they help sometimes to show, along with other 


scores, the general trend of one’s interests in 


an occupational group. But gencrally they 


can be ignored” (italics added). 

The problem of the present investigation is 
now clearly in focus: If we are to ignore so- 
called then we are ignoring 
most of our data on any given SVIB report 
form 


chance scores, 
This is no problem if we are ignoring 
this data for cogent reasons. However, if we 
are ignoring this data on the assumption that 
such scores are “easily obtainable by chance” 
and, therefore, “indeterminate for the scales 
in question,” then it seems highly important 
and necessary to check the validity of this as- 
sumption. To make this validity check is the 
purpose of the present investigation. 


METHOD 


Subjects. Subjects were chosen from a wide variety 
sources included volunteer students 
in psychology courses at all levels, 
at the University 


nonstudents. The 


of sources. The 
volunteer clients 
Counseling Service, and volunteer 
specific 
were to: (a) 


criteria for 
be of th 
ore within a certain range of scores on a particular 


acceptance as 


a ubject male sex; (b) 
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SVIB occupational scale upon original testing under 
standard conditions; (c) have the ability to fake 
directionally on the experimental scale in question 
(subjects who could not fake directionally could not 
be expected to “fake chance”), and; (d) never have 
received vocational counseling. 

Occupational scales used. There were three such 
scales. To maintain homogeneity, these scales were 
all in Group IV (technical) on the standard Hankes 
SVIB report form. These scales were: 


1. Forest Service Man: Chosen because the chance 
range is the lowest of any scale on the SVIB. To be 
eligible as Forest Service Man (FSM) subjects, 
subjects had to score at least 10 standard score 
points above the chance area on the FSM scale. 

2. Printer: Chosen because the chance range is 
relatively high when compared with all occupational 
scales and because this scale has the highest chance 
range in Group IV. To be eligible as Printer (P) 
subjects, subjects had to score at least 10 standard 
score points below the chance range. 

3. Carpenter: Chosen because this scale has the 
median chance elevation in Group IV and ap- 
proximates the median for all occupational scales. To 
be eligible as Carpenter (C) subjects, subjects had 
to score within the chance range on this scale. 


Faking directions used. There were three sets of 
instructions used. Each subject received the Fake 
Chance set, plus one other set. All instructions were 
dittoed, were presented separately, and were preceded 
by the statement: 


You have already taken the attached vocational 
interest test at least once. Your present task is to 
retake this test, following the usual procedure for 
marking the answer sheet, but with the special 
“test-taking attitude” that will now be described: 


All special instructions were followed by the usual 
admonitions to mot discuss the specific attitudinal set 
with one’s class mates, and with a statement that the 
test administrator “does not know about this ex 
periment, and has been specifically instructed not to 
help you, except in those matters directly related to 
the mechanics of test administration.” All experi 
mental tests were taken in the psychometrics room 
of the University Counseling Service, and all were 
administered by a _ trained psychometrist. The 
specific instructions were: 


1. Fake High or Low Instructions: This is a test 
of your ability to “fake” a score on a single oc- 
cupational scale of the attached test. The occupa- 
tional scale involved is the scale for —____. 
Answer each relevant item in such a manner that 
the final result will be a 
ew . scale. 

2. Respond Same Instructions: You first took the 
attached vocational interest test on — ai 


score on 


The present test is a test of your ability to answer 
each and every item exactly as you did when you 
first took this test. If you succeed in doing this, 


the results of both testings should be exactly the 
same. Your “score” on this part of the experiment 
will depend on how well you can recall your 
previous response to each and every item. 

3. Fake Chance Instructions: This is a test of 
your ability to “fake” a score on a single occupa- 
tional scale of the attached test. The occupatiorial 
scale involved is the scale for — omen, file 
swer each relevant item in such a manner that 
the final result for the scale will 
be the same as a score that is obtainable by chance. 
Remember, you are mot to answer the entire test 
with a “chance” attitude, but only those items 
that will give you a chance score on the —_— 
scale. 


Experimental groups. There were a total of 175 
subjects. These were originally selected on the basis 
of the criteria previously indicated. Each subject 
was then randomly assigned to a counterbalanced 
experimental group. Each group had N=25, as 
follows: 


Forest Service Man: These 50 subjects were ran- 
each to Fake 
Chance-Fake 


domly assigned as follows: N=25 
Low-Fake Chance (FLC) and Fake 
Low (FCL) groups 

Printer: These 50 subjects were randomly as- 
signed as follows: N =25 each to Fake High-Fake 
Chance (PHC) and Fake Chance-Fake High (PCH) 
groups. 

Carpenter: Here there were 75 subjects. These 
were randomly assigned to one of three experimental 
groups as follows: Fake Chance-Fake Low (CCL); 
Fake Chance-Fake High (CCH); and Fake Chance- 
Respond Same (CCS). A score was classified “same” 
if it was within five standard score points of the 
original score. 

RESULTS 


Carpenter. The three C groups are homo- 
geneous with respect to original mean scores 
(F = 1.69; p> .05) and with respect to 
Fake Chance mean scores (F < 1.0). How- 
ever, when the combined distribution of orig- 
inal scores is compared with the combined 
distribution of Fake Chance scores, the dif- 
ference is significant (¢ = 2.15; p < .05). 

The three distributions of Fake Low, Fake 
High, and Respond Same scores differ sig- 
nificantly (F = 221.92; p < .01). Thus, since 
there is no significant difference between the 
means of the distributions of Respond Same 
scores and original scores (¢ = .11; p > .05), 
the differences between the distributions of 
Fake High scores versus original scores and 
also Fake Low scores versus original scores is 
significant. 

We may get a measure of the amount of 
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TABLE 1 


SUMMARY OF 


‘arpenter 


Variable : & F  & oS ( 


N in group 

Original M 

Original SD 

True Chance M 

True Chance SD 

Fake Low, Fake High, 
Respond Same VW 

Fake Low, Fake High, 
Respond Same SD 

Fake Chance M 

Fake Chance SD 

N in chance area 
(Fake Chance) 

N in chance area 
(Fake Low-High-Same 

N excluded from study 


(unable to fake 


faking by subtracting each Fake Low score 
from that subject’s original score and from 
each subject’s Fake High score, his original 
These two distributions may be de- 
scribed as follows: Fake High minus original 
score: MM = 23.08, SD = 7.66: original score 
minus Fake Low M = 21.20, SD 
9.63. When the means of these two distribu- 
tions are compared, the observed difference is 
found to be nonsignificant (¢ = .40; p > .05). 

The final Carpenter question concerns the 
ability of these subjects to fake chance. It is 
relevant to note first that the true chance 
means and standard deviations listed in Table 
1 are mathematically exact (Lyerly, 1957; 
Strong, 1959) and, thus, have no sampling 
error. It is meaningless, therefore, to ask if 
the observed, faked chance means could differ 
from the true due to random sampling varia- 
tion; they could not. Thus, these subjects 
could not fake chance. 


score. 


score: 


Forest Service Man. The two FSM groups 


original 
with 


Fo 


are homogeneous with respect to 
mean scores (¢=.25; p> .05) and 
respect to Fake Low mean scores (¢ 
p > .05). Combined original mean 
differ significantly, of course, from the com- 


scores 


Total 


FINDINGS 


Forest Service 


Man Printer 


Total 
i'SM 


Total 
PHC P 


4 50 : 50 
33.00 34.26 
5.07 : 5.92 
12.10 10 
aan. 63 5.26 


2.80 . 58 


17.68 
4.04 
34.60 
4.70 
46.92 


11.46 


13 $0 


$8.16 
12.20 


bined Fake Low mean (t = 7.56; p< .01). 
Finally, the two groups are homogeneous with 
respect to Fake Chance mean scores (¢ = .90; 
p > .05), with the combined Fake Chance 
mean differing from the mathematically true 
chance mean and also from the combined 
Fake Low mean (¢ = 8.36; p < .01) but not 
from the original mean (¢ = 1.50; p > .05). 
Again we may measure amount-of-faking 
by subtracting from each original score the 
low score of each subject. These two distribu- 
tions of difference scores may be described as 
FLC: M SD = 12.69: 
FCL: M = 29.16, SD = 12.24. The difference 
between these two means is nonsignificant 
(t = .08; p> .05). 
Again the two are homo- 
geneous with respect to original mean scores 
(¢ = .15; p > .05) and with respect to Fake 
High mean scores (f = .41; p > .05). Again 
due to the method of sample selection, the 
distribution of original and the dis- 
tribution of Fake High must differ 
(t= 7.87; p< .01). The two P groups are 
homogeneous with respect to their distribu- 
tions of Fake Chance scores (¢ = 34; p 
.05), with the mean of the combined Fake 


follows: 29.88, 


Printer. groups 


scores 


scores 
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Chance distribution differing from the mathe- 
matically exact chance mean and also from 
the combined original mean (t = 6.65; p< 
01) but here not from the combined Fake 
High mean (t= 1.61; p> .05). The dis- 
tributions of amount-of-faking scores (here, 
High minus original) may be described as 
follows: PCH: M= 26.76, SD= 12.70; 
PHC: M = 32.68, SD = 11.51. The observed 
difference between these means is nonsignifi- 
cant (¢ = .64; p > .05). 

The above presentations have been oriented 
toward findings within individual occupational 
Since it may worthwhile to 
consider between occupational scale findings, 


scales. also be 
the following discussion will emphasize these 
findings: 

Faking Low. Here, there are three groups 
involved: CCL, FCL, and FLC. These three 
differ with (F - 
29.81; p < .01) due to the significantly lower 
original mean score of the CCL group. How- 
ever, when we compare the Fake Low mean 
scores of the three sets we obtain an F of 
less than 1.0, which, of course, 
ceptance of the null hypothesis. (For the 
record, the distributions of original minus 
Fake Low scores for the three sets differ sig- 
nificantly: F 01.) Thus, we may 
combine our three distributions of Fake Low 
into a single Fake Low distribution 
with a mean equal to 4.65 and a standard 
deviation of 12.28. 

Faking High. The three sets involved here 
are CCH, PCH, and PHC. Again the dis- 
tributions of original scores differ significantly 
(F = 45.17; p < .01) due to the significantly 
higher original mean scores of the CCH 
group. However, here also we find that the 
three distributions of Fake High mean scores 
do not differ significantly (F = 1.78; p> 
05). Since these differences are nonsignifi- 
cant, it again follows that the three distribu- 
tions of difference scores, High score minus 
original score, must differ significantly (F 
4.99: p< .01). Combining the three sets of 
Fake High scores yields a total distribution 
with a mean of 47.60 and a standard devia- 
tion of 11.78. 

Faking Chance. 


respect to original score 


means ac- 


5 an 
5.18; p 


scores 


When the seven experi- 


mental groups are compared with respect to 
their distributions of Fake Chance scores, the 


following results are obtained: 
SS), 6 
SS, = 3 f= 168 
2.08 

05 


This finding of nonsignificance is the most 
significant finding of the entire study and en- 
ables us to combine all chance results into a 
single distribution with a mean of 39.71 and 
standard deviation of 15.15. The standard 
error of this grand mean is 1.15 which, at 
the .01 level (¢ 2.605), gives a range of 

2.99, which includes none of the three oc- 
cupational scale chance means! (The same is 
true of the standard error of the grand stand- 
ard deviation: 15.15 = .81.) 

Excluding the CCS group, we find 18 sub- 
jects did indeed fake chance when instructed 
to do so. This figure may be contrasted with 
the 19 subjects who faked chance when they 
were instructed to fake directionally. This 
difference is nonsignificant: Fourfold chi 
square = 0; p> .99. (It must be confessed 
that when this study was originally designed, 
the investigator had hopes that the subjects 
would, when instructed to fake directionally, 
inadvertently fall into the chance area with 
much greater frequency than they did. That 
this did not happen was due to the ability of 
the subjects to fake directionally so well.) 

Considering, finally, the CCS group, we 
find that of the 25 subjects, 21 could fake 
chance when they were instructed to “respond 
same” but only 7 could fake chance when 
they were specifically instructed to fake 
chance. This difference is significant: Four- 
fold chi square = 13.717; p < .001 


CONCLUSIONS 


The present findings offer additional evi- 
dence that there are individual differences in 
ability to fake directionally. In addition, these 
data offer some evidence that “faked” group 
means tend to stabilize. Garry (1953) found, 
for example, that his Carpenter group had a 
Fake High mean standard score of 53. This 
compares favorably with the present finding 
of 49 for Carpenters and 47 for Printers, a 
nonsignificant difference. Garry does not, un- 
fortunately, report enough data to permit a 
test of significance of differences to be run 
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between all sets of data. As further evidence, 
now of a stabilized “floor,” in the present 
investigation the three sets of Fake Low 
scores were not significantly different. 

Thus, since each group of subjects began 
their faking from a different position, the 
evidence is clear that some scales are more 
easily faked than are others. However, any 
given scale seems as easily faked upward as 
downward. The evidence for this assertion is 
found in the Carpenter scale Fake High and 
Fake Low findings. It will be recalled that the 
amount-of-faking scores of the two groups did 
not differ significantly. Now, the CCH and 
CCL groups began their faking from an 
average standard score of 25.54. This stand- 
ard score is approximately equivalent to a 
raw score of — 12. Since the total raw score 
range is 400 points, plus 200 to minus 200, 
neither group is restricted in possible range. 
In raw score terms, the CCL group averaged 
a raw score of — 104, which is 92 raw score 
points of movement, while the CCH group 
averaged a raw score of 95, which is 107 
raw score points of movement. The “move- 


ment” difference between these two groups 


is a nonsignificant 15 points. 


More importantly, it clear that 
“chance” as an instructional set is meaning- 
less. The college subjects in the present in- 
vestigation simply could not easily obtain 
scores by chance, in the same manner that 
they could easily obtain high or low scores on 
these skilled trades occupational scales. It 
seems inappropriate, therefore, to continue 
interpreting so-called chance scores as being 
due to chance. This is true whether we are 
making the interpretation as counselors or as 
researchers since, in the former case, we are 
ignoring most of our data for an inappropriate 


seems 


reason and, in the latter case, we are introduc- 
ing a variable that, being both unpredictable 
and also unmanipulatable, is, therefore, con- 
ceptually unsatisfying and experimentally un- 
usable. Indeed, rather than to ignore scores 
within the chance area, it seems better to 
ignore the chance areas themselves! Remov- 
ing these, gray, chance areas from the SVIB 
report form will enable practicing counselors 
to interpret all scores as per their empirically 
derived meanings. 
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This study continues a program devoted 


tionnaires. 
sions and 4 validation scales 
red to 506 male and 


administ¢ volunteer 


dents. 3 factor analyses of items, each with 72 ite 
sions, revealed 14 FHIDs for which every item had a loading of .5 


36 total FHID 


Factor analysis of the 
variables resulted 
nance, hostility, and compulsiveness. 

Scales developed to measure personality 
traditionally have tended to be complex in 
factor content. This condition is especially 
characteristic of those scales developed by 
selecting items which between 
criterion groups. Factor analytic studies of 
the items in the Minnesota Multiphasic Per- 
sonality Inventory scales, for example, illus- 
trate this phenomenon (Comrey, 1957, 1959; 
& Levonian, 1958). Such complex 
in respects when com- 
pared with factor pure scales. Some of the 
claimed for factor 
A person’s score on a 
factor pure scale is less ambiguous since there 
is only one variable involved. On a complex 
he 
on 


discriminate 


Comrey 


ales suffer certain 


advantages which could be 


pure scales are: (a) 


scale, a given score achieved, for ex- 


may 
factor 
another, 


scales for investi- 


ample, by a high score one com- 
ponent and a 
versa. (6) In 
gating other phenomena thought to be re- 
lated to personality traits, are 
more interpretable since only one factor is 
involved. For complex the correlation 
does not reveal which of the component fac- 
(c) The differential 
pattern of an individual’s personality trait 
structure can be delineated more clearly with 
pure factor measures. With complex trait 
measures, differences in relative position on 


low one on or vice 


using such 
pralatinr 
correlations 


} 
SCales, 


tors is or are involved 


tudy 


by 


1 The computations for thi were carried out 
on the IBM 709, operated Western Data 
Processing Center, University of California, Los An- 
geles. Support for the research came from a grant by 
the University of California. 


the 


6 multiple choice items were used for each of 


to the development of a system of fac 
tored homogeneous item dimensions (FHIDs) 


in the area of personality ques- 


32 personality dimen- 


These items were dispersed in a questionnaire and 


female Ss, preponderately college stu- 


72 items from 12 of the 36 dimen- 


or more 


core variables plus 9 background data 
in the major personality factors: shyne 


, dependence, domi- 


the elements making up the complex tend to 


be obscured by combining several factors into 
a composite. (d) In predicting criteria of ad- 
justment, optimum weighting can be applied 
to pure factor measures, whereas in complex 
measures, the weights for the individual fac- 
tor components in the complex are fixed by 
the characteristics of the scale. 

In an attempt to develop a system of pure 
factor personality dimensions, several previ- 
ous research studies were undertaken to de- 
velop Factored Homogeneous Item Dimen- 
sions, hereafter referred to as FHIDs (Com- 
rey, 1961; Comrey & Soufi, 1960, 1961). A 
FHID represents a collection of items which 
have been written to measure a particular de- 
fined trait of personality and which have sub- 
stantial loadings on a factor obtained from a 
factor analysis of these items with items meas- 
uring other dimensions. Thus, in developing 
a FHID, the steps are as follows: define the 
trait, construct which should measure 
the trait, factor analyze these items with 
others which were designed to measure differ- 
ent traits, select those items designed for the 
FHID which actually appear together on an 
emerging factor with substantial loadings, re- 
is necessary to obtain high 
loadings for all items used. 


items 


vise and repeat 
Homogeneity does not guarantee factor 
purity, of course, and since this procedure 
can result in homogeneous but not factor pure 
FHIDs, it is evident that not all FHIDs will 
prove equally valuable. Factor analysis of 
large collections of FHIDs, however, will be 
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helpful in revealing which FHIDs seem to be 
relatively pure factor measures and which 
seem to be more complex. Those FHIDs of 
greatest factor purity may be developed to 
adequate reliability levels for use individu- 
ally, or they may be combined with other 
FHIDs defining the same factor, thereby pro- 
viding a composite score of broader meaning. 

The present research represents another 
step in a program devoted to the develop- 
ment and analysis of a system of FHIDs. 
This research is based upon the belief that 
development of a sufficiently extensive sys- 
tem of FHIDs eventually will make it pos- 
sible to select a relatively pure factor subset 
to cover the entire domain of personality 
which can be described by means of inven- 
tories. It is expected that such a subset of 
relatively pure factor FHIDs as a group will 
be superior to systems of complex personality 
scales now in use, both for description and 
for prediction of human behavior. 


PROCEDURE 
The Sample 
hundred and six 


volunteers from both 
The students 


Five obtained as 
nonstudent 
taken principally from 
psychology classes at the University of California, 
Los Angeles. The nonstudents inter- 
ested friends and family of student participants and 
some general population cases obtained by door to 
door soliciting in an upper middle class 
hood in west Los Angeles. Some information on the 
sample will be provided by the following break- 
student student males, 177; 
nonstudent nonstudent males, 32. Cer- 
tain biographical information also was obtained on 
the respondents. These data were coded and treated 
as additional variables in an analysis to be described 
later in this paper. The and statistical re- 
sult were as follows for these biographical variables 
(means and sigmas are given in parentheses) : 


ubjects wert 
student and 
sources were 


consisted of 


neighbor- 


down: females, 229; 


females, 68; 


coding 


37. Age (22.12, 8.65) 
out coding 

38. Sex (.41, 

39. Marital 
others 

10. Occupation (.19, ) for 
others 

41. Birthplace (.10, .30) 0 
others 

42. Highest School Grade Completed (12 
numerical values used without coding 

43. Religion (.53, .50) O for 


numerical values used with- 


49) O for female, 1 for male 
Status (.19, .39) O for single, 
student, 
for United States 
79, 1 
protestant, 1 for 
others 

44. Race (.09, 


28) O for Caucasian, 1 for others 


45. Political Preference (.88, .87) O for democrat, 
1 for independent or no preference, and 2 
publican 


for re- 


The Inventory 


dimensions were selected 
for study along with four validation scales and nine 
background information Many of these 
variables had been investigated in previous studies 
of this series (Comrey, 1961; Comrey & Soufi, 1960, 
1961) as well as by others in the past (Cattell, 1957; 
Guilford, 1959). The four validation scales included 
were: Truth scale, designed to indicate the extent to 
which the individual is willing to reveal unflattering 
information about himself; Acquiescence scale, de- 
signed to reveal the extent to which the subject 
tends to deviate systematically from the middle re- 
sponse in one direction or the other; Validity scale, 
designed to reveal if the subject is responding in a 
rational manner; and the Social Desirability scale, 
designed to reveal the extent to which the subject is 
distorting his responses in the direction of social de- 
sirability. 

Six items were used for each of the 


Thirty-two personality 


variables. 


36 dimensions 
booklet 


dimension 


or scales. The 216 items were spaced in a 
such that the measuring a given 
well Subjects’ responses were re- 
corded on a separate answer sheet. These consisted 
of numerical answers taken from one of two avail- 
able scales 


items 


were separated 


Scale X had the possible responses: 
1. Never 
2. Almost never 
Rarely 


+. Occasionally 

5. Sometimes 

». Frequently 

7. Very frequently 
8. Almost always 
9. Always 


1 
scale 


Y had the possibl 

1. Absolutely not 

2. Very definitely 
Definitely not 

4. Probably not 

5. Possibly 

6. Probably 

7. Definitely 

8. Very definitely 

9. Absolutely 


responses 


The answer scales were printed on both sides of 
the answer sheet. Each item number in the test 
booklet was followed either by an X or a Y to in- 
dicate to the subject which response scale he should 
use 

Each of the 36 dimensions will be listed below to- 
gether with a sample item. The mean score, standard 
deviation, and reliability 
} 


coefficient, respectively, will 


ve given in parentheses following each dimension 


name. To estimate reliability, the average Pearson 
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correlation coefficient among the six items in the 
pool was computed, estimating the reliability of a 
one-item test. Thi ted by the 
Spearman-Brown formula to a coefficient 
for a six-item test 

1. Self-Sufficiency (32.0, 
myself is enjoyable for me 

2 7.55, 2 X. I depend upon 
people to help me with my 

3. Need to 
ways 
life. 

4. Thoughtfulness (35.4, 
think about deep, serious qu 

5. Helpfulness (36.6, 6.65 
portunities to help others 

6. Acquiescence (29.1, 2.70, .53 j is hard to 
iter item was 51Y. 


Two other 


coefficient wa orres 
reliability 


1X 


Being b 


Succorance (25.0, 
Excel (33.5, 


been very important 


I look for op 


make a good living these da 
It is easy to 
opposite 
ity, signs of the correlation 

7. Talkativeness (30.0, 7.5 
tive person 

8. Welfare of Loved 
try to give my family 

9. Truth Scale (26.9, 
people I know 

10. Impatience 
impatient. 

11. Friendliness 
large my circle of fri 

12. Rhathymia (23.9, 
most important thing ir 

13. Need for Freedon i Fe, en 3 
to be free of any deman 

14. Depression (17.7, 
and depressed 
15. Sensitivity (23.7, 8 86) 15X. I 


get a good jol such 
pairs were included tting the reliabil- 


ignored 


7X. I ar 


were 


(28.9, 


I think the 
to enjoy oneself 


X. I like 


I feel blue 


7 ? 4 
feel upset 


even by slight criticism 
16. Cynicism ( 
officials would accept bribes if they were large enough 


28.1, 6.00, .8 91Y. Most 


public 
17. Shyness (24.8, 8.84, .9 92X. I find it diffi 
cult to talk with a p lave just met 
18. Ascendance (34.1, 7 93X pco- 
ple try to stuff their i it, I object 
19. Hostility LS.75, 94X. Even when 


someone does a go 


n 


When 


cize 
20. Drive to Finish 
start a job, I finish it 
21. Cautiousness 7.4, 
a modest income 


5X. When I 


77) 96Y. I 


I can count on than a 


would 
rather have 
large but unsteady onc 
22. Aggression (17 
mad I break thing 


23. Conformity (25.9, ; X. I 


97X. When I get 


feel better 
doing what ever) 
24. Validity 
number “1” 
question | by By. Fy Dy 6, 


Pl int 


question 


write the 
Other 
respectively, as 
answers 
25. Self 100X. No mat 


yntrol my temper 


Control (36.2, 7.7 85) 


I am, I cx 


ter how angry 


PERSONALITY DIMENSIONS 


259 


81) 101Y. 
I wouldn’t 


26. Psychopathic Personality (15.4, 6.3 
If a person is stupid enough to be cheated, 
hesitate to cheat hin 

27. Culture (35.6, 8.92 
that are considered cla 

28. Need for Order 
sets me to be in a 

29. Activity (33.7, 6. 

30. Personal 
am careful of my 

31. Love of Food and 
106Y. I enjoy eat 

32. Impulsiveness (26.8, 6.7 
headlong into thing 


like 


books 


messy 


Grooming 
persona 
ing more 
without ki 
am getting into 
33. Sex Drive (27.2, 
I will always be interested in sex 
34. Social 
items will be given 
table manners at home are ji 
out. 130Y. I give 
while charities. 151X 
break I can. 172X. If the | 
money to my account, I w 1 retu money 
the bank. 193X. In choosing iends I 
things like race, religion, and political beliefs. 214Y 
I would rather go to jai ra month than 
of a jam by telling a lie 
35. Hostility II (17.4, 
ple are a burden to society 
6. Need for Approval 
important 
The 36 
of 12 purposes of 
items comprising the fir 
correlated by Pearson 


All th 

X. My 
when I 
worth- 


Desirability (36.0, 6.48, 
for this dimension. 1 
ist as good 

h as I can to 


a mu¢ 
I give handicapped peopl 


eat 
every 
too much 
to 


ignore 


6.95 Most 
11Y. It is 


1 communit) 


for me to bi accepted in my 


limensions were divided into three group 


tor 


same procedure 
7 


mensions and al 
analyses of 72 
tions. In 
minimum 
sults comparable 
method but has the 
of communality. Rot 
cally by the norma 
In addition to th 
tor analysis of tota 
out. Tota 
computed for 
variables 


each case 
residual 


dimension 


din 


score 


were added biographical da 


ables described variables were 


tercorrelated, usir factor 
lyzed by the 
analyses 


RESULTS 


Description in detail of these factor studies 
is evidently impossible in a short article. Only 
a brief indication of the 
For each of the three analyses of items, 15 
factors were extracted and rotated. A factor 
emerged in one of the analyses for each of 


results will be given. 
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the hypothesized factors. The Acquiescence 
scale split into three factors and the Social 
Desirability scale split into two. With few 
exceptions, items for the hypothesized per- 
sonality factors appeared with loadings of .3 
or more on those factors and only on those 
factors. One or two residual factors appeared 
in each analysis. Of the 32 personality scales 
used, omitting the validation scales, 14 had 
factor loadings of .5 or more for every item 
written for the scale. Seven additional scales 
had five out of six items with loadings of .5 
or more. Seven scales had four items with 
loadings of .5 or more, and three scales had 
three such items, while the remaining scale 
had only two items with loadings of .5 or 
more on the hypothesized factor. The dimen- 
for which all six items used had load- 
ings of .5 or more on the appropriate factor 
were: 


sions 


Self-Sufficiency 
Succorance 

Need to Excel 
Thoughtfulness 
Talkativeness 

Welfare of Loved Ones 
Depression 

Cynicism 

Shyness 

Hostility I 

Drive to Finish 
Psychopathic Personality 
Culture 

Need for Order 


Although some of 
less than .5, 


their items had loadings 
the following dimensions will be 
given also, because they proved to be impor- 
tant in defining major factors in the factor 
analysis of total dimension scores: 


11. Friendliness 
23. Conformity 
36. Need for Approval 
39: Hostility I] 


30. Personal Grooming. Space does not per- 


mit the presentation of these dimensions here, 
but they have been deposited with the Ameri- 
can Documentation Institute (ADI). 

* The items and item statistics for certain dimen- 
sions as well as Table 1, the correlations among all 
and Table 2, the rotated fac 


been deposited with the American 


For 


total dimension scores, 


tor matrix, have 
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each of the dimensions just mentioned the six 
items used are given. Each item is followed 
by its mean, 
loading, 


standard deviation, and factor 
respectively. Items for the remain- 
ing dimensions will not be given since they 
cannot be regarded as sufficiently refined at 
the present time. 

In the factor analysis of total dimension 
scores, the most prominent factors of interest 
in the personality domain under investigation 
were: 

Factor I1I—Shyness. Variables with signifi- 
cant (i.e., .3 or more) loadings ‘on this fac- 
tor were: Shyness, .71; Friendliness, —.68; 
Talkativeness, Sensi- 


Activity, —.39; 
tivity, .36; and Depression, .32. 

Factor IV—Dependence. Loadings of .3 or 
more were: Conformity, .66; Need for Ap- 
proval, .61; Succorance, .61; Self-Sufficiency, 
-.42; and Sensitivity, .35. 


—.57; 


Factor V—Dominance. Significant loadings 
were: Impatience, .57; Need for Freedom, 
.57; Ascendance, .52; Impulsiveness, .49; Ag- 
gression, .42; Self-Control, —.40; Sensitivity, 
35; Need to Excel, .32; and Rhathymia, .31. 

Factor [X—Hostility. Significant loadings 
were: Hostility II, .69; Psychopathic Person- 
ality, .66; Cynicism, .58; Hostility I, .54; 
Depression, .44; Rhathymia, .36; Helpful- 

Social Desirability, 32; and 
Impatience, .30. 

Factor XII—Compulsiveness. Loadings of 
.3 or more were: Need for Order, .68; Drive 
Cau- 


ness, —.35; 


to Finish, .63; Personal Grooming, .53; 
tiousness, .51; Welfare of 
Need for Approval, .39; 
38; Self-Control, .34; 
and Impulsiveness, —.31. 
Factor III had loadings of .86 for Age, . 
for Marital Status, .83 for Occupation, 
for Highest School Grade Completed, and .3 
for Welfare of Loved Ones. It is an age 
factor. Factor XI had loadings of —.69 for 
Religion Political Preference, 
prompting the name, Protestant Conserva- 
tism. Factor VI had loadings of .67 for Sex 


4S; 


Loved Ones, 
Social Desirability, 


Rhathymia, a; 


and .50 for 
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Publications Project, Photo- 
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Make checks payable to: 
Chief, Photoduplication Service, Library of Congress 
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and .44 for Need to Excel. It might be called 
Masculinity. Factor XIII had loadings of .45 
on Sex Drive, .41 for Love of Food and Drink, 
—.37 on Birthplace, and .3 on Rhathymia. It 
appears to be a form of Hedonism. Factors I, 
VII, X, XIV, and XV had no loadings as high 
as .4. 
DIscuSsSsION 


Dependence, Hostility, and Compulsiveness 
were clearly identified factors also found with 
similar variables and loadings in a previous 
study. The present Shyness factor is similar 
to what was called Friendliness in that analy- 
sis. A notable portion of the variance for the 
Shyness dimension was concentrated in a fac- 
tor of Neuroticism in the earlier study. Neu- 
roticism did not appear in the present in- 
vestigation since several of the most impor- 
tant dimensions defining it were not included. 
The present Dominance factor, although still 
not well defined, bears some relationship to 
a factor in the previous investigation which 
had loadings in the forties for Aggression 
and Impulsiveness. The loading for Sex Drive 
there was .56 while in this analysis it was 
only .26. More FHIDs will be needed with 
heavier loadings on this factor before its na- 
ture can be‘clearly established. 

The 19 dimensions for which items and 
item statistics have been deposited with the 
ADI, although needing revision in some cases, 
constitute an effective nucleus of FHIDs 
which can be expanded to cover a greater 
portion of the personality Factor 
analysis of total scores from a considerably 
expanded collection of FHIDs should provide 
reliable information on the factor composi- 
tion of human personality as measured by 
the inventory. The cumulative results of this 
and the previous studies would seem to sug- 
gest, however, that four of these ultimate 
factors are likely to be very similar to the 
present factors of Shyness, Dependence, Hos- 
tility, and Compulsiveness. By combining 
those FHIDs which best define these factors, 
it is possible to obtain factor scores of greater 
reliability and broader meaning, as noted 
above. The Shyness factor, for example, may 
be obtained as a sum of scores from the FHID 
of Shyness and those from the inverse scores 
for Friendliness and Talkativeness. The De- 
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pendence factor may be obtained as a sum 
of the FHID scores for Conformity, Need 
for Approval, and Succorance. Factor scores 
for Hostility may be obtained by adding 
FHID scores from Hostility II, Psychopathic 
Personality, and Cynicism. Finally, the Com- 
pulsiveness factor may be obtained as a sum 
of the FHID scores for Need for Order, Drive 
to Finish, and Personal Grooming. 

In addition to being useful for the ultimate 
purpose of helping to define the structure of 
personality, the 19 FHIDs deposited with 
the ADI can serve a more immediate pur- 
pose. These constitute a set of personality 
variables, with internal consistency reliabili- 
ties between .77 and .91, which can be em- 
ployed in research programs where criteria 
of adjustment are to be related to inventory 
personality variables. Some of the above men- 
tioned benefits of using pure factor scales may 
be expected to result from their application. 
The total scores for the 19 scales may be used 
individually, of course, but in some circum- 
stances it will be desirable to use the factor 
scores described above. 

Of some interest is the fact that the vari- 
able Social Desirability had only three rela- 
tively small loadings of .25 or more, .39 on 
the unimportant tenth factor, .38 on the 
Compulsiveness factor, and —.32 on the Hos- 
tility factor. Since Social Desirability has 
sometimes been described as accounting for 
most of the variance in personality inven- 
tories, all the items used to measure this 
variable were given in the description of 
dimensions presented above. Study of the 
correlations in Table 1 shows that although 
Social Desirability had several correlations in 
the range of .30 to .46, it also had a sub- 
stantial percentage of very low correlations. 
The Acquiescence variable proved even less 
pervasive, having only one loading greater 
than .25, namely, a value of .39 on the minor 
tenth factor. 

The reader will be able to judge for him- 
self the extent to which the items employed 
can be considered adequate 
Social Desirability and 


measures of 
Acquiescence. Al- 


though these response set variables may not 
have been assessed adequately, at least it can 
be said that the measures employed to rep- 
resent them in this study proved to be rela- 
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of the personality factors 


tively independent 
obtained. 
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SHOW-Z-MINX: 


A NOTE ON CODE LETTERING 


VALENTINI 
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A common practice in the testing of con- 
sumer packaged goods is to place the test 
product in a plain package devoid of identify- 
ing marks except for a letter. Horton 
and Mecherikoff (1960) have demonstrated 
the existence reliable letter preferences, 
and as precaution against this possible source 


code 
of 


of bias, researchers give careful consideration 
to the selection of the particular code desig- 
nations to be To the 
knowledge, however, no one has ever demon- 
strated that the might be 
troduced by the choice of particular code 
letters is sufficient to affect the outcome of 
an actual product test. The purpose of the 
present note is to report such an instance 


employed. writers’ 


bias which in- 


Two alternate versions of a packaged dry 
food product were being consumer tested. In 
analyzing the data pertaining to perceived 
differences between the two product versions, 
a surprising number of references were noted 
relating to different particle 
fact the particle sizes were identical. 
closer inspection of the 
products were packaged, 
realization that the 
employed to designate 
referred to as having the 
was reversible. The code 
either right up 
code letter used for the 
only be read right side up. 

On the basis of this difference in the nature 
of the code letters employ ed, it was reason- 
able to conclude that without the benefit of 


in 
On 
the 
the 
was 


sizes, when 


boxes in which 
there 
letter, which 


the product 


came 
code 
version 
smaller particle size, 
» letter could be read 
side 


or upside down. The 


other version could 
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letters of the alphabet can be read upside 
letters as code designation 

effect upon respondents’ perceptions of the product, 


sibility of settling or separation 


into a pre duct test of 
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York 


right 
in blind product testing 


down as well as 


¢ 


An incident 


use of one of these letters had the effect of introducing 


f 


a packaged dry food 


other identifying marks, the product with 
the reversible code letter must surely have 
been opened by the respondents from the 
bottom of the box more frequently than was 
the other product. It was therefore hy- 
pothesized that the perceived difference in 
particle size was attributable, not to actual 
product differences, but rather to the effects 
of settling to the bottom of the box. Because 
the product with the reversible code letter 
was more likely to have been opened from 
the bottom than was the other version, this 
version tended to be perceived as having the 
smaller particles. 

A simple experiment was performed to test 
the hypothesis: (a) that there were no per- 
ceivable differences in particle size between 
the two product versions, and () that the 
effects of settling would have been sufficient 
to account the reported differences 
Briefly, this experiment consisted of a series of 


for 


paired comparisons using five naive subjects 
and measured samples obtained from six boxes 


of each of the two product versions, opened 


separately from and bottom. For each 
comparison the subject was asked to indicate 
which of the two samples appeared to have 
the larger particle size. The result was that 
no measurable effects were noted as 


top 


a func- 
tion of product version, thereby confirming 
the fact that the particle sizes were identical. 
In every instance, however, (i.e., 12 boxes out 
of 12) the sample taken from the top of the 
box was more often judged to have larger 
particles than the sample from the bottom of 
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the box. The conclusion, that the use of re- 
versible code letters (SHOW-Z-MINX) can 
have a significant effect upon the respondents’ 
perceptions of a test product was thereby sup- 
ported. This conclusion is particularly rele- 
vant when other identifying marks do not 
clearly identify the top of the test package 


VALENTINE APPEL AND RANDOLPH J. HERNANDEZ 


from the bottom, and when there is the pos- 

sibility of settling or separation of the product. 
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SOCIOMETRIC RATINGS AS PREDICTORS 
OF ACADEMIC PERFORMANCE 


LEON A 
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An experiment to determine the degree of accuracy 
ratings can be used to predict academic performance. A 


sociometric 
sociometric 


which 
14-item 


with 


questionnaire was administered to 86 students before any academic grades were 


>) 


available. 2 rating measures, one emphasizing future job performance and the 
other dealing with personal adjustment, were found to be significantly related 


to academic performance (r= .40, p< 


01; r= .35, p< .01) 


The relationship, 


however, appeared to be restricted to the upper range of academic performance; 
the measures being unable to predict failing or below average performance. 


The present investigation is concerned with 
the use of sociometric ratings as predictors 
of academic success. Previous research has 
utilized sociometric ratings to predict subjects’ 
success or failure in various training programs 
such as Officers Candidate School (Williams 
& Leavitt, 1947), Woman Marine and Wave 
recruit training (Rigby, Sayers, Ossorio, & 
Wilkins, 1957; Wilkins, Rigby, & Ossorio, 
1959), and Air Force basic training (Flyer, 
1961). 

In the study by Wilkins, Rigby, and Os- 
sorio (1959) a sociometric questionnaire was 
administered to a large number of Wave re- 
cruits early in their training. The question- 
naire consisted of seven pairs of items (one 
“positive” and one “negative” in each pair) 
which dealt with different aspects of military 
performance and with friendship preferences. 
The sociometric ratings discriminated accu- 
rately between those recruits who were either 
separated from service, required to repeat the 
training, or were judged to be in the lowest 
quarter of their company by two independent 
judges, and those who were judged to be in 
the upper quarter of their graduating group. 
None of their ratings, on the other hand, were 
correlated with objective grade averages. The 
authors stated that the “strongest items’’ were 
the two which asked: “Who are the five 
women in your company who you think 
will make the best Waves?” and “From the 
women you know in the company, list the five 
who seem to have the most trouble fitting in 
at boot camp.” 


The purpose of the present study was to 
cross-validate these results using a male popu- 
lation and objective grade averages as criteria. 


PROCEDURE 

The subjects were 86 enlisted students attending 
three military courses. These subjects had success- 
fully completed basic military training (8 weeks) 
and an introduction to Army medical field activities 
(2-8 weeks) before entering these courses. A small 
number of subjects had completed anywhere from 
3 to 16 years of military service before re-enlisting 
and selecting one of these courses. It is apparent that 
these subjects had already demonstrated their ability 
to “adjust” to the Army as compared to the Wilkins, 
Rigby, and Ossorio (1959) study where the subjects 
were first starting their basic training 

After 12-14 days of the course had elapsed, and 
prior to any academic examination, the sociometric 
questionnaire was administered. The questionnaire 
was the same as that devised by Wilkins et al. (1959) 
with the only modifications being such 
referring to “students” and “class” instead of “Waves” 
and “company.” The questionnaire items 
follows: 


changes as 
were as 


1. Which five students in your class do you think 
will perform the best on the job? 

2. Which five students in your class do you think 
will perform the worst on the job? 

3. Which five students in your cla 
sonally like the best? 

4. Which five students in your cla 
sonally like least? 

5. Which five students in your class are easiest 
to live with? 

6. Which five students in 
to live with? 

7. Which five students in your class would you 
like best to take orders from? 

8. Which five students in your class would you 
least like to take orders from? 


s do you per 


do you pel 


your class are hardest 
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9. Which five students in your class would you 
like most to take home on furlough? 

10. Which five students in your class would you 
not care to take home on furlough? 

11. Name five students in 
always full of good ideas 


your class who are 


12. Name five students in your class who never 
seem to think up anything new. 

13. From the students you know in the class, list 
the five who seem to have the least trouble fitting 
in during this training period 

14. From the students you know in the class, 
list the five who seem to have the most troubk 
fitting in during this training period. 


RESULTS 


Sociometric scores were obtained for each 
subject for each of the two questions which 
Wilkins et al. (1959) reported as their most 
highly discriminating items. In this prior 
study, the number of times a subject was nomi- 
nated for each question constituted this sub- 
ject’s sociometric scores. In the present study 
the positive and negative forms of each ques- 
tion were utilized. For example, the number 
of times a student was nominated for Item 2 
was subtracted from the number of times his 
peers selected him for Item 1. This procedure 
yielded one score representing a_ subject’s 
sociometric status in terms of the “best on 
the job” items. The same procedure was fol- 
lowed for Items 13 and 14. These two socio- 
metric scores were referred to as performance 
status, and adjustment status, respectively. 

One additional sociometric measure 
taken. The sociometric questionnaire 


was 
con- 


tained seven positive (or accepting) ques- 


tions and seven negative (or rejecting) ques- 
tions. The total number of times a subject 
was nominated for any one of the seven nega- 
tive questions was subtracted from the total 
number of times his name appeared in re- 
sponse to the positive questions. This score 
represented the sociometric status of each in- 
dividual. 

The criterion scores were the final academic 
grade-averages achieved by subjects in their 
respective courses. 

The three sociometric scores were correlated 
with the criterion with the following results: 
performance status—r = .40 (p< .01); ad- 
justment status—r = 35 (p< .01); 
metric status—r 07. An _ additional 
analysis was done with the two sociometric 


Ssoclo- 


McHEnNry, 
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TABLE 1 
STUDENTS IN THE UPPER, 
WHo 


STANDINGS 


PERCENTAGE OI MIDDLE, 


GRouUP \CHIEVED 


ACADEMK 


AND LOWER SOCIOMETRIC 
VARIOUS 


9.5 
47.6 
42.8 


99.9 


were found to be 
correlated with achievement. In 
terms of their grades, the subjects 
were divided into three criterion groups: up- 
middle 50%, ©. The 
same division was also performed on the basis 


scores which significantly 
academic 
course 
per 25%, and lower 25 
of their sociometric scores. The data is pre- 
sented in Table 1. This table, for example, 
demonstrates that of all 
ranked academically in the upper quarter of 
their respective classes, 63.6% of had 
also ranked in the upper quarter on the first 
sociometric measure, 31.8% had 
the middle 50%, and 4.5% had been placed 
in the lower quarter sociometrically. 


those subjects who 
them 


scored in 


DISCUSSION 


results of the 
sociometric 


The overall 
that 
lated with the academi 


study indicate 
various ratings are corre- 
grades obtained by 
students in these military courses. This is in 
contradiction to the report of Wilkins, Rigby, 
and Ossorio (1959) who stated that none of 
their sociometric questionnaire items were re- 
lated to academic grades alone. Although the 
indicate that 
the sociometric measures are significantly re- 
lated to academic grades, the efficiency of this 


correlation coefficients two of 


instrument is not very high. Table 1 indicates 
that the degree of false prediction would be 
quite high if this instrument were to be used 
as a screening tool. 

To a certain degree, the present findings 
serve as a cross-validation of the Wilkins et al. 
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(1959) study. It is apparent that the total 
sociometric questionnaire (the sociometric 
status measure) is not as effective as the 
measures utilizing only the “best on the job” 
and ‘“‘most trouble fitting in” items. The re- 
sults, however, indicate that these 
metric ratings are much more accurate in the 
prediction of successful academic perform- 
ance than they are in identifying the poor 
student. 

Flyer (1961) reported on a similar study 
which utilized 640 airmen who were begin- 
ning basic training when tested. His data 
indicated a positive relationship between a 
peer-rating score (almost identical to the 
present performance status measure) and on 
the job evaluations. Flyer’s data also indi- 
cated that the prediction of superior per- 
formance more accurate (in terms of 
there being no “false positives”) than the 
prediction of poor performance (there being 
a fairly high number of “false negatives’). 
The studies by Flyer (1961) and Wilkins 
et al. (1959), however, still demonstrated a 
greater ability to predict failing performance 
than did the present study. 

The most likely explanation of the differ- 
ence between these studies lies in the differ- 
ences between their subjects. In the present 
study, the subjects could be considered to be 
a fairly select group. Many of those who 
would not do well in advanced military 
schooling had most likely been already elimi- 
nated during the many weeks of training 
which preceded these courses. This restric- 
tion in range in terms of the criterion would 
lower the validity correlations. It is suggested, 
then, that the sociometric questionnaire is of 
greater value in situations where the “base 
rate” is quite high. When one is dealing with 
a problem requiring finer discrimination, the 
sociometric procedure is still of 


socio- 


was 


value but 
much less so. 

It is also possible for us to discuss the 
ratings relative 
to the results obtained when more objective 
instruments are used. In an earlier study 
(Rosenberg, McHenry, Rosenberg, & Nichols, 
in press), students these same 
courses were administered the California Psy- 
chological Inventory (CPI), the 
which were correlated with the 


efficiency of these sociometri 


enrolled in 


scales of 
academic 
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grades obtained by each student. A measure 
of intellectual ability was also included as a 
potential predictor. This was the General 
Technical Score (GT) obtained from two sub- 
tests of the Army Classification Battery; a 
paper-and-pencil aptitude test administered 
to all enlisted personnel upon entrance into 
service. The use of the GT as a measure of 
intellectual ability is based on the results of 
a study by Montague, Williams, Lubin, and 
Gieseking (1957), in which the two subtests 
which are averaged to give the GT were found 
to correlate well with the Wechsler-Bellevue 
Form 1. 

The CPI, a paper-and-pencil personality in- 
ventory with 18 scales, yielded 7 scales sig- 
nificantly correlated with academic grades 
(p < .01). The r’s ranged from .35 to .46. 
The GT correlation was .58 (p < .01). The 
use of multiple prediction equations (GT plus 
one scale of the CPI) increased the r to .60. 
A comparison of this data with the correla- 
tions obtained with the two sociometric meas- 
ures (performance status and adjustment 
status), indicates that a standardized meas- 
ure of intellectual ability is a much better 
predictor than a peer-rating procedure in 
this particular situation. Scattergrams of GT 
and multiple prediction scores with academi 
grades also indicated that the accuracy of 
prediction was not confined to the upper 
range of performance; both of these meas- 


ures could be used to predict failing or poor 
performance. 


The relatively poor ability of peer ratings 
to predict failing academic performance, may 
be a function of the feelings evoked when 
those in authority require one to evaluate a 
peer. There is some evidence that as subjects 
become more aware of the administrative out- 
come of their ratings (e.g., 
fellow student), they become increasingly 
reluctant to “damn” a fellow group member 
As part of an on-going 
(Rosenberg & McHenry, 1961) small 
classes were administered the sociometric 
questionnaire with different explanations of 
their use. One class first answered the ques- 
tionnaire under the impression that the re- 
sults were for research purposes only (a “re- 
search set’). The next day they again filled 
out the questionnaire but were told that the 


“washing out” a 


research project 


two 
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faculty had “decided to utilize the findings 
in evaluating each student’s potential to work 
in this field” (an “administrative set’). The 
second class answered the questionnaire once 
and under an administrative set. 

The data obtained from the first class was 
examined to determine what changes might 
be caused by this shift in set in terms of the 
responses to each question. The predictive ac- 
curacy achieved in the second class was com- 
pared to that of similar classes tested in the 
past. 

In the test-retest situation, the changes 
which occurred indicated a definite tendency 
to increase the number of names nominated 
for each question except Number 14. The 
subjects became quite reluctant to judge the 
psychological adjustment of their peers. The 
increase in choices (up to the maximum of 
five) and a tendency to change their choices 
for negative questions, caused a decrease in 
validity of the instrument. In the second 
class, where one administration under an ad- 
ministrative set was the only procedure used, 
the prediction of the lower academic quartile 
became completely impossible. The general 


interrelationships of sociometric position and 
academic achievement remained, but only for 
the upper academic criterion groups and only 
in terms of the performance status measure. 

Although subject to the findings of current 


work aimed at increasing the WN and cross- 
validating the initial findings, it appears that 
the poor prediction of failing performance is 
related to the subject’s unwillingness to be 
the instrument of another’s downfall. The 
data also that when these studies 
leave the realm of research and become ad- 


suggests 


B. McHenry, AND A. M. ROSENBERG 


ministrative procedures (as is currently be- 
ing advocated to an increasing degree), the 
prediction of poor performance may decrease 
even further. It is felt that these findings 
raise a question as to just what peer ratings, 
or sociometric measures, can add to the more 
traditional objective measures of the indi- 
vidual’s own intellectual capacity, aptitudes, 
vocational interests, and motivation. 
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GOOD AND POOR INTEREST ITEMS 
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Interest items may be evaluated in terms of unfamiliarity, ambiguity 


, differen- 


tiation between men-in-general and criterion group, number of scales on which 
the items are weighted, reliability, and extent to which they contribute to total 
score. These characteristics are utilized in the elimination of inferior items and 
addition of new items in revising the Vocational Interest Blank for Men. Like, 
indifference, dislike (L-I-D) items are contrasted with preference between items 
in Parts VI and VII and ratings of present abilities in Part VIII. In general, 
good items are located midway between the interests of criterion groups and 
are liked distinctly more than the average by some criterion groups and dis- 


tinctly less by other criterion groups. 


A revised Vocational Interest Blank for 
Men will be published in the near future. The 
revision involves elimination of inferior items, 
addition of new items, and rewording of some 
items; also revision of the format of the blank 
and rewording of directions. These changes in 
turn necessitate revision of scales and rede- 
termination of norms. 

On what basis may interest items be evalu- 
ated as “good” and “poor’’? Interest items 
have been accepted or rejected according as 
they improved the total score of an interest 
scale. The extensive research of K. E. Clark 
(1961) has added greatly to our understand- 
ing of how to measure the contribution of 
items and how to select the best ones. But no 
one, far as I know, has considered the 
characteristics of an interest item, aside from 
its contribution to total score, and has es- 
tablished standards by which they may be 
deemed good and poor. This article outlines 
the standards used in accepting and rejecting 
items for the revision of the blank but no 
claim is made that the final answer to the 
question, what are good and poor items, is 
given herewith. 

Many letters have been received over the 
years complaining about obsolete items and 
unfamiliar items but no one suggested that 
there were items that made little or no con- 
tribution to the differentiation of occupational 
groups. Recently about 30 counselors were 
asked which items should be discarded or re- 
worded. Some supplied the opinions of their 
assistants as well as their own so that about 
50 replies were received. About half of the 
items were criticized by one or more 


as 


oS 
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perts.” There is good agreement as to which 
items are obsolete but judgments vary greatly 
as to which items are of little use. Such judg- 
ments are not too reliable since the “experts” 
differ appreciably and many of the criticized 
items are useful in terms of objective stand- 
ards. 

The greatest difficulty confronting one in 
revising items is that one is working in the 
dark, or possibly we should say in the dusk. 
One does not know how many different inter- 
ests there are and therefore does not know 
whether all the different interests are meas- 
ured by items on the blank. One also does 
not know whether the same basic interest is 
measured by one or many items. Intercorrela- 
tion among the items would provide a work- 
able basis for excluding items where too many 
measure the same interest. But if 10 items 
correlated .60 to .70 with each other, should 
half be excluded because of the common in- 
terest or should all be retained since each is 
measuring to some degree one or more other 
interests that may need to be considered? 

Such complexities are of supreme impor- 
tance if the objective is to measure interests 
as such. But the objective is otherwise; it is 
to differentiate men engaged in different oc- 
cupations and thus to aid young people to 
find the occupation best suited to them. The 
value of the inventory depends upon how 
well occupational groups are differentiated 
and how well such differentiations aid in coun- 
seling. The value of any item correspondingly 
depends upon how much it contributes to such 
purposes. When we know more about how 
many interests there are, their nature, how 


‘ 
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will be able 
inventories. 


they change with ages, etc., we 
to develop better interes 
interest items 
evaluation of items. Instead of one single 
standard for such evaluation at least four 
standards may be recognized, two involving 
judgment and two objective measurement. 
The first two standards are (a) unfamiliarity 
and (0) ambiguity and the last two are (c) 
differentiation between criterion groups and 
men-in-general and (d) number of scales on 
which the items are weighted. There does not 
seem to be any way of combining all four 
standards into and consequently un- 
familiar and ambiguous items have first been 
eliminated and the remainder ranked in terms 
of the two objective standards. (Two other 
standards, reliability and stability of re- 
sponses to items, are considered in a later 
section of this article.) 

1. Unfamiliarity—Items should not be used 
when they are unfamiliar to all, or nearly all, 
those to be tested. Obsolete items are those 
which may have been familiar in the past but 
are not so now. Only a few items have been 
so classified. Examples are: Number 275 
Bolshevists, the 10 items Numbers 301-310, 
listing outstanding men a generation ago, and 
several magazines no longer published. A good 
rule is to omit names of living persons and 
present day books, movies, magazines, etc. 
There are other items, such as Number 22 
Certified Public Accountant, which are not 
obsolete but are unknown to most college 
students. But items familiar to some and not 
to other testees are frequently good items. 


Revision of necessitates 


one 


Familiarity is to some extent an expression 
of interest—lawyers and engineers differ not 
only in their expressed interests but in their 
awareness of certain elements of their envi- 
ronments related to their interests. The most 
tenable explanation as to why the judgments 
of our counselors differ so greatly with regard 
to many items is that they are not familiar 
with everything and their judgments are cor- 
respondingly colored. How can overall and 
partial unfamiliarity be distinguished? 

2. Ambiguity—-Items are ambiguous be- 
cause they may be interpreted in two or more 
ways (as Number 39 
flowers or sell 


Florist, who may raise 
them), or because the items 
are poorly expressed. In general, the more 
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words used in expressing an item the greater 
the chance of ambiguity. Some items have 
been reworded with this in mind. 

3. Difference between category 
(like, indifference and dislike) of criterion 
group (C) and men-in-general (P)—The ef- 
fectiveness of a scale is best expressed by the 
percentage of overlapping of the scores of C 
and P. The effectiveness of an item, its share 
in differentiating C and P, may be expressed 
in various ways but they all represent the 
category differences between C and P. Items 
may accordingly be ranked in terms of their 
largest category difference. 

Table 1 gives the category differences be- 
tween P and C of two items, Numbers 1 and 
2. In the case of Item Number 1, the three 
category differences between C and P are 
20%, —6%, and 14%. With Item Number 2, 
the three category differences are 16%, 0%, 
and —16%. The largest category difference 
of Item Number 1 is 20% and of Item Num- 
ber 2 is 16%. On the basis of largest cate- 
gory difference Item Number 1 would be 
ranked as superior to Item Number 2. 

4. Number of which item is 
weighted—If an item is not weighted on any 
scale it is obviously useless. Possibly the ideal 
inventory would contain 1,000 items and each 
of 50 scales would be weighted on 20 items, 
none of which is weighted on more than one 
scale. We cannot imagine, however, that any- 
one could develop such an inventory today. 
In an inventory of 400 items which is ex- 
pected to differentiate 50 occupations it is 


responses 


scales on 


TABLE 1 
oF MEN-IN-GENERAL (P 
Group (C 


RESPONSES AND A CRITERION 
TO Two Items 


Like Indifference Dislike 
‘¢ (Y (ZF 


Responses 0) 


Item 1 
P 
Cc 
Difference 
Item 2 
P 
Cc 


Difference 
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TABLE 2 


NUMBER OF SCALES, 32 oN WHICH 


THREE ITEMS ARE WEIGHTED ACCORDING TO MINI 
, 10%, 12%, 


AMONG SCALES, 
MUM CATEGORY DIFFERENCES OF 6% 
14%, anv 16% 
Items 6% 10% 12% 14% 
387 
365 


398 


Mean of 40 items 
in Part VIII 


Mean of all items 


on Blank 


Note 
best iter 


The three iter 


is in Part VIII 
necessary to employ items that occur on 
eral scales. 

The number of upon which an 
item is weighted depends upon what is the 
minimum category difference acceptable for 
weighting. The category differences between 
criterion groups and men-in-general vary 
greatly with any given items. If differences of 
6% are weighted then items will be weighted 
on more scales than if differences of 16% are 
required. Table 2 shows that the responses of 
32 criterion groups to Item Number 387 differ 
from the responses of men-in-general by 10% 
on only one scale and by 6% on only two 
This a very item it 


sev- 


scales 


scales. is as is 


poor 
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weighted on 2 scales on the very low stand- 
ard of 6% category difference. The median 
item in the table is weighted on 4 scales when 
the standard of 16% category difference is 
required and is retained on the revised blank 
but it is not as good an item as the last item 
in the table. 

If every category difference of 10% is 
weighted, a tally of 32 scales shows that the 
3991 items are weighted on the average on 
14.4 scales, Table 2. If the standard of weight- 
ing items is raised from 10% to 16% the av- 
erage is 7.0 scales. 

In revising the items on the blank unfa- 
miliar, obsolete, and ambiguous items were 
first discarded and then items were discarded 
which were weighted on less than four scales 
where the minimum category difference of 
16% is required. Exceptions to the two ob- 
jective standards have been made in a few 
cases. All told 298 of the 399 items on the 
blank have been retained. 


Discarded Items 


The 399 items on the blank are distributed 


in Table 3 in terms of the number of scales 


on which the items are scored. For example, 
16 of the 399 items were weighted on 0 scales, 


29 on 1 scale, 31 on 2 scales, and 24 on 3 


scales, totaling 100 items scored on less than 
4 scales. The 399 items average 7.0 scales per 
that 


never been used 


399 


1JTtem Number 400 has 


total number of items is 


SO 


the 


TABLE 3 


RETAINED AND 


SCALES ON WHICH THI 


Ann wo Ww 


2N 


a category difference of 16% 


DISCARDED ITEMS DISTRIBUTED 


ACCORDING TO NUMBER OIF 


ITEMS WERE SCORED 


s replaced with new items 


0-3 


} 
iles 
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item. The 101 discarded items average much 
lower, i.e., 3.9 scales per item, than the 298 
retained and reworded items, i.e., 8.0 scales 
per item. It is rather ironical that the 20 
obsolete items which have been so criticized 
over the years averaged 8.3, the highest of 
any subgroup. Only 60 of the 100 items were 
discarded because they were scored on less 
than 4 scales, see Column 6 of the table. 
Eight items of the remaining 40 items were 
rejected because they were obsolete or un- 
familiar, 13 have been reworded, and 19 re- 
tained for a variety of reasons. Five items 
(Numbers 294, 298, 312, 315, and 318) 
serve a useful purpose in contrast to other 
items in Part VI; several may influence re- 
sponses to contrasting items, such as Num- 
ber 234—‘“Progressive People” (1 scale), 
and Number 235—‘Conservative People” (6 
scales); or represent interests not otherwise 
included, as Number 20—‘‘Cartoonist” (2 
scales), and Number 191—“‘Handling Horses” 
(3 scales). 

Our counselors were very largely in agree- 
ment in condemning the 20 obsolete items al- 
though they had been useful on the average 
on 8.3 scales, see Column 4 of the table. But 
that does not mean that they would be use- 
ful today. The obsolete items included maga- 
zines no longer published; 10 names of men 
not so well known today as previously; and 
items such as Number 75—Railway Conduc- 
tor, Number 275—Bolshevist, and Number 
321—Street-Car Motorman versus Street-Car 
Conductor. 


Twenty-one items were classified as unfa- 


miliar, as Number 22—Certified Public Ac- 
countant and Number 28—Consul; ambigu- 
ous, as Number 36—Factory Worker and 
Number 392—Rebel Inwardly at Orders from 
Another, Obey When Necessary; similar to 
other items, as Number 16—Bookkeeper and 
Number 60—Mechanical Engineer; and Num- 
ber 256—Sick People which is replaced by 
two new items—Physically Sick People and 
Mentally Sick People. 

A considerable number of items have been 
reworded. In his Career Pattern Study, D. E. 
Super reworded many items of the blank to 
conform to Grade 7 vocabulary. We have 
adopted many of his changes, employing also 
suggestions from other sources. We have had 
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to rely on our own judgment and that of a 
few associates as to which of all the sugges- 
tions to adopt. Similarly we have had to de- 
cide whether the rewording would affect re- 
sponses to the items and in consequence affect 
weighting and eventual scores. When we felt 
such changes would result we called the item 
a “new” item. But 49 of the reworded items 
we felt would not appreciably change re- 
sponses and these items are classified as “‘re- 
worded” items. 

The eight parts of the blank vary appre- 
ciably in the number of scales on which the 
items are weighted, see Table 4. The items in 
Part I, Occupations, are the most effective 
and the items in Part V, Peculiarities of Peo- 
ple, the least effective. 

The optimum men-in-general group (point 
of reference, P) is one that is located in the 
center of the occupational groups (Strong, 
1943, Ch. 21 and 22). Presumably the same 
principle applies to items. With aptitude tests 
the ideal is for 50% of testees to pass the 
item and 50% to fail. With interest tests the 
ideal is for half of the criterion groups to 
like the item and for half to dislike it. If we 
accept the principle that the optimum item is 
located midway between the interests of cri- 
terion groups then we might expect that such 
an item would be weighted on several scales. 
The relations involved here are illustrated in 
Table 5 where 32 criterion groups are dis- 
tributed according as the largest percentage 
of criterion responses of criterion groups dif- 
fers from the responses of P, with respect to 


the two items Number 2—Advertiser and 


TABLE 4 


AVERAGE NUMBER OF SCALES AMONG 32 SCALES PER 
Item In Parts I to VIII oF THE MEN’s BLANK IN 
TERMS OF 16% WEIGHTING 


. Occupations 
School Subjects 
Amusements 
Activities 
’, Peculiarities of People 
VI. Order of Preference 
VII. Comparison of Interest between Two Items 
VIII. Rating of Present Abilities and 
Characteristics 


e of Activities 


Total 
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TABLE 5 
Di 
PERCENTAGE DIFFERENCI 


TRIBUTION OF 32 CRITERION GROUPS IN TERMS OF 


OF CATEGORY RESPONSES 
MEN 


ro ITEMs 


BETWEEN CRITERION IN—-GENERAL 


ADVER 


AND 


GROUPS WITH REFERENCI 


TISER AND MUSICIAN 


Percentage 
difference of 
category re 
sponses be 
tween criterion 
and men-in 
general groups 


Item 2 
dver 
tiser 


Item 62 
Musician 


16 


Number 62—Musician. When the standard 
for selection of items is a category difference 
of 16%, Item Number 2—-Advertiser will be 
weighted +1 on 5 scales, —1 on 9 scales, and 
O on 18 scales. This is a useful item whereas 
Item Number 62—Musician 
ceptable as it is weighted 

scales and —1 


is barely ac- 
1 on only four 
In other words, 
when no criterion group differs by 16% from 
P then no scales would be weighted on that 
item. When the criterion groups differ greatly 
from P then many of them will be weighted 
on the item, but presumably not all, since in 
any distribution of criterion groups 
groups should differ to a small amount from 
men-in-general, which group approximates the 
average of all criterion groups. 

A good item should accordingly be one that 
is located midway between the interests of 
criterion groups and furthermore should be 
one where some criterion groups like the item 
distinctly more than the average and other 
criterion groups like it distinctly less than the 
average. 

Part VIII is one of the poorer parts of the 

lank, very largely because many of the items 
iolate the requirements of good items. The 


on no scales. 


some 


b 
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items have been criticized because men over- 
estimate their abilities in responding to them. 
It has been contended, however, that the 
norms are based on such estimates and conse- 
quently the “fudging” of individuals is more 
or less taken into account in the scoring. 
There is however the problem of the non- 
fudger, how badly he might be penalized. 

The items may also be criticized on the 
ground that they are not interest items. But 
as far as the writer recollects none of our 
“experts” criticized them on this ground; 
in fact, they did not suggest that any of the 
items should be discarded, only that some 
items be reworded. 

The items appeal to many business men, as 
well as others. This is not a paramount con- 
sideration, but even not very items 
should be considered if they dispel the idea 
that filling out the blank is a lot of nonsense. 

Eighteen of the items are poor, i.e., not 
scored on at least four scales. The poor items 
may be distinguished from the good items by 
being general in nature—not specific, and 
by representing the favorable characteristic 
checked by most men regardless of occupa- 
tion. Compare, for example, the three poorest 
and three of the better items in Part VIII. 
Most men would like to believe they are “ap- 
proachable,” “reticent concerning confidential 
affairs,” and “can discriminate between more 
or less important matters.” Moreover, these 
abilities are useful in most occupations. On 
the other hand, many men know they cannot 
“write a concise, well organized report,” “put 
drive into an organization,” and “follow-up 
subordinates effectively,” and they do not 
feel it is necessary to do these things in order 
to be successful in their work. Production and 
sales managers, for example, claim they pos- 
sess these abilities much more than men-in- 
general, whereas scientists and lawyers claim 
they possess these qualities to less degree. 

Seventy-three percent of men in many dif- 


good 


ferent occupations say they possess the traits 
in 14 poor items of Part VIII and 7% 
“No,” in contrast to 50% who 
to the 19 good items and 18% 
“No.” ? It is high percentage 


say 
“Ves” 
who say 


of Yes 


say 


the 


2“VYes and No” responses are reversed in four 
cases when no refers to possession of the good aspect 
of the trait and six items are omitted as the “?” re- 
sponse indicates the best response. 
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responses that gives the impression that there 
is excessive fudging. The items are not poor 
because of overestimation, as such, but be- 
cause most men respond in the same manner, 
when what is needful is that some occupa- 
tional groups shall respond in one way and 
the other occupational groups in reverse 
manner. 


Selection of New Items 


It is comparatively easy to define a poor 
item in terms of opinions of experts and a 
mass of statistical data accumulated over 
many years. But even here no clean cut ob- 
jective standard has been established and 
personal judgment plays an important role. 
The situation is otherwise when it comes to 
selecting new items. There are, of course, no 
data regarding the new items. The only basis 
for selection is personal judgment of a few 
experts. The objective is to add items not al- 
ready represented and to select those that 
conform with the requirements of a good 
item, previously defined. When only a very 
few scales are involved a try out on criterion 
groups is possible but where 50 scales are 
involved the in time and money is 
prohibitive. Unfortunately it is not possible 
to employ the usual procedure of testing 
sophomores as the responses of college stud- 
ents do not disclose very well how occupa- 
tional groups differ in their responses to items. 


cost 


Revision of Format 


Several changes in format have been con- 
sidered but only one has been adopted. Re- 
sponses to the last 12 items on the blank have 
been rearranged, for example, Item 398 was 
originally in the form: 

398. (1) My advice sought by many (2) Sought 
by few (3) Practically never asked Ist ( ) 2nd ( ) 
jrd ( ) 

The item now appears as follows: 

398. My advice is asked for By many ( ) By 
few ( ) By almost nobody ( ) 
We are indebted to Martin 
formerly an assistant to D. E. Super in his 
Career Pattern Study, for the improvement. 

It was known by 1925 that responses of 
liking, indifference, and disliking to interest 
items gave useful data. Five- and seven-step 


Hamburger, 


responses had been tried out but could not 
be weighted with the small amount of data 
available from the small samples used in 
those days. Some one should consider the 
possibilities of employing five- or seven-step 
responses instead of three-step with large 
samples, particularly in studying permanence 
of interests (Strong, 1943, p. 659). 

As far as we remember no one had con- 
sidered measuring preference between in- 
terests. More or less to try out this idea, 
comparison between 2 interests was intro- 
duced in Part VII and a sort of rank order 
comparison of 10 interests in Part VI. It was 
confidently expected that the blank would be 
revised before long and the relative efficiency 
of these two formats and the L-I-D type of 
response would be established. But new prob- 
lems arose so continuously that revision kept 
being postponed. Even today the only con- 
clusion is that all three formats are equally 
good and that it is the items themselves that 
are important. Part I is the best and Part V 
the poorest part of the blank, both employing 
L-I-D responses. Part VI employing rank 
order preference is second best, see Table 4. 

Part VI has been criticized by many prin- 
cipally on the ground that many testees do 
not follow instructions. We seriously con- 
sidered changing the format, expressing the 
items in terms of 12 triads. We have, however, 
decided to keep the original format but to 
revise the instructions in the hope that they 
will be clear to all. Ranking items was a new 
practice years ago but occurs in various forms 
in recent tests so that this format is not now 
so unfamiliar. 

Ratings of Present Abilities and Character- 
istics, Part VIII, was also incorporated on a 
try out basis. It seemed possible that men 
might differ with respect to how they rated 
themselves in terms of their interests. How 
far this may be so is still to be determined. 
The facts are that about half of the items are 
poor and the remainder good. It is possible 
that some form of comparison, paired, triad, 
or rank order, might yield better results than 
with “Yes, ?, No” format. 


Numbering of Items 


All retained items have been given the same 
number as on the original blank. New items 
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have been inserted in the place of discarded 
items. This will facilitate test-retest studies 
in which the old and revised blanks are used. 
(Many a time we wished that this had been 
done when the old blank of 420 items was 
compared with the blank of 400 items.) If 
the numbering had not been retained addi- 
tional “occupational” and “activity” items 
might have been substituted for “amusement” 
and “people” items, many of which are poor. 
Retaining the old numbering made it im- 
possible to change the format of Part VI. 

Revision of Directions. Minor changes in 
the wording of directions have been made. 
The greatest change pertains to Part VI, 
mentioned above. 


Reliability and Stability of to 


Items 


An item may be deemed reliable if the 
category differences between responses of C 
and P have a critical ratio of 2.7 or more. 
Items must have a difference of 
about 17% to meet this requirement, when 


Responses 


category 


both C and P have populations of 100. When 
both have populations of 400 the needful 
category difference is reduced to 9. Actually 


P has a much larger population than 400 and 
criterion groups vary in size from 113 with 
YMCA secretary to 1,048 with Psychologist, 
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revised scale, and average 328. For nearly all 
scales an item with a category difference of 
12% is reliable, and category differences of 
14 and 16 are still more satisfactory, statistic- 
ally speaking. We do not need to consider 
reliability as a fifth standard since it is in- 
cluded under the third standard—differences 
between category differences—discussed above. 

The possible sixth standard, i.e., stability 
of responses to an item, has not been used in 
differentiating good and poor items. For an- 
other purpose 99 college students gave their 
responses to a sample of 114 items as worded 
on the present blank and again 3 days later 
to items as worded the blank. 
The percentage of identical responses 
tween test and retest were calculated for each 
item. The mean was 78.6, o = 6.3, for those 
items similarly worded on test and retest with 
a range of 60-90. Items with a stability ratio 
of less than 60% for a 3-day interval might 
be deemed undesirable. 


on revised 


be- 
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MANUAL PERFORMANCE 


DURING COLD EXPOSURE 


AS A FUNCTION OF PRACTICE LEVEL AND THE 
THERMAL CONDITIONS OF TRAINING 


R. ERNEST CLARK? ann CLARKE E. JONES 2 


Quartermaster Research and Engineering Command, Natick, Massachusetts 


3 groups of 10 Ss each were given varied thermal experience (warm or cold 


weeks of training 


(a) 


during 3 
follows: 


hands) 


were as 


on a 
1 day of cold-hand training significantly reduced the 


standard manual task. The results 


size of a manual decrement usually associated with cold exposure, but con- 
tinued cold experience did not; (b) skill level on the task per se did not 
interact with the cold induced performance decrements; and (c) the thermal 
conditions associated with performance on the task appeared to become part 
of the stimulus complex eliciting correct manual responses when these thermal 


conditions were maintained 


for a large number of trials, i.e., 


the Ss learned, 


not merely to perform on the task, but to perform with warm, or cold, hands 


specifically. 


The loss of manual capability during cold 
exposure has been studied with regard to the 
roles of ambient temperature and/or duration 
of exposure (Bader & Mead, 1951; Clark, 
1961; Horvath & Freedman, 1944; Teichner, 
1957), synovial fluid viscosity (Hunter, Kerr, 
& Whillans, 1952), body temperature (Gay- 
dos, 1958; Gaydos & Dusek, 1958), hand 
surface temperature (Clark, 1959, 1961; 
Dusek, 1957; McCleary, 1953), and the rate 
of change in hand surface temperature (Clark 
& Cohen, 1960; Cohen & Clark, 1961). In 
each of these studies the subjects have been 
relatively unskilled and have had little rele- 
vant cold experience prior to, or during, the 
experimental exposure. It is not presently 
known, therefore, whether highly skilled sub- 
jects actually exhibit performance decrements 
in the cold, or, if they do, whether they can 
learn to overcome the manually restraining 
effects of cold. It was the purpose of the 
present investigation to evaluate these pos- 
sibilities by training different groups of sub- 
jects to high levels of skill on a commonly 
used manual task, and by studying the 
effects of cold exposure and relevant thermal 
experience on their performance throughout 
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Engineering 
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The needed clarification should 
thus be obtained regarding subjects’ contribu- 
tion to the initial and boundary conditions 
within which previously determined relation- 
ships may be expected to hold. 


learning. 


METHOD 


Thirty essentially nude subjects were exposed to 
constant ambient temperature conditions of 70° F. 
dry bulb and 60° F. wet bulb (approximately 50% 
relative humidity). Localized hand cooling was ac 
complished by exposing subject’s hands and forearms 
to 10° F. air within a precision cooling unit. Hand 
skin temperatures (HSTs) were monitored with 
copper-constantan thermocouples attached to the 
dorsal side of both fifth fingers, and were continu- 
ously recorded on a _ Leeds-Northrup multipoint 
recorder. 

Th 


Was 


standard manual task used at laboratory 
chosen again for th 
cause of its 


the performance 


our 
present work not only be- 
to past but because 
required to be a 
sort of psychomotor activity, ie., it 


relevance research, 
basic 
conforms well 
to the manual and finger dexterity components of 
psychomotor performance derived factor analytically 
by Fleishman (Factors II and VII, 1954). The task 
the succe tying “overhand and 
knot in each of 15 hanging from a 
f the hand Manual 
capability is measured in terms of the time required 
to complete the task 

All subjects were trained 
weeks interrupted by weekends. Each day 
completed the task 10 times, and on Days 6 
and 11 (Mondays of Weeks 2 and 3) they “warmed- 
up” prior to training with 10 additional 
Thus, by the end of the experiment the 
had completed the task 170 times, and 


appears 


involves SSive of one 
bight” 


rod 


cords 


inside of cooling box 


for 15 days, or three 
5-day 


they 


trials 
subjects 
had tied 
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2,550 separate knots. This amount of training is 
as much as 71/4, times that given in earlier studies. 

Although all subjects had an equal number of 
trials on the task, they did not have equivalent 
thermal experience in their training. The 30 sub- 
jects were randomly divided into three 10-man 
groups, and the groups were assigned different ex- 
posure conditions for their 15 days of training. 
Group 1 performed each day with cold hands 
(45° F. HST). Group 2 performed on odd numbered 
days with warm hands (90° F. HST—established 
with a heated muff) and on even numbered days 
with cold hands (45° F. HST). Group 3 performed 
for 10 days with warm hands (90° F. HST) and 
then for the remaining 5 days of the experiment 
with cold hands (45° F. HST). Thus, Group 1 
indicated how well subjects may learn to overcome 
the manually restraining effects of cold; Group 3 
indicated how highly skilled subjects with little 
relevant cold experience are affected when first sub- 
jected to cold; and Group 2 permitted a con- 
trolled comparison between the experimental groups 
at all levels of practice. 

During the warm-up trials given prior to the 
respective training trials on the Mondays of Weeks 
2 and 3 (Days 6 and 11), all subjects performed 
with warm hands (90° F. HST). These extra trials 
served a dual purpose: they permitted subjects to 
warm-up on the task after the 2-day weekend, and 
they yielded the only direct comparison of the 
experimental groups under identical exposure con- 
ditions. 


DURING 


CoL_p ExPposuRE 


RESULTS AND DISCUSSION 

The combined effects of practice level, 
thermal experience, and exposure conditions 
on manual performance are summarized in 
Figure 1. 

The differences between the average per- 
formance times of the groups over trials were 
evaluated by the techniques of analysis of 
variance and the differences in warm-up per- 
formance by ¢ tests. For convenience in dis- 
cussing these findings and their statistical re- 
liabilities, the data are reported in three sec- 
tions: the data of Weeks 1 and 2, the data 
of Week 3, and the warm-up data. 


Weeks 1 and 2 (Days 1-10) 


Variance analyses of the cold-hand per- 
formance of Group 1 and the warm-hand 
performance of Group 3 over Days 1-10 in- 
dicated significant main effects for days (F = 
12.57; df = 9/180; p < .001), and for groups 
(F = 10.245; df=1/180; p< .005), but 
no groups by days interaction (F = .582; 
df = 1/180). Subsequent analyses, however, 
produced a significant interaction term when 
only the data of Days 1 and 2 were considered 
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Fic. 1. Average times required to complete the manual task as influenced by hand surface 
temperature, thermal experience, and practice level. 
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Fic. 2. Learning functions on the present task as influenced by hand surface 
temperature level. 


(F = 8.072; df = 1/56; p< .005). It was 
concluded that 1 day of cold-hand training 
for the Group 1 subjects significantly re- 
duced the size of the performance decrement 
associated with cold exposure, but that further 
cold experience did not. 

Disregarding the data of Group 1 on Day 
1 of the study, it was possible to describe 
the effects of cold exposure on manual per- 
formance by varying the asymptote of a 
smooth hyperbolic curve fitted to the Group 
3 data by the method of “least squares” (see 
Figure 2). These theoretical functions suggest 
for the present task and experimental condi- 
tions that the absolute increase in performance 
time produced by cold exposure is approxi- 
mately equal to 16% of the best (asymptotic) 
times achievable under warm-hand conditions. 
In this case, cold exposure resulted in an 
increase of 3 seconds in manual performance 
times during all phases of training, excepting 
Day 1. 

To insure that the 3-second difference be- 
tween Groups 1 and 3 was not merely due 
to implicit, pre-experimental differences in 


* 
J- 
a 
3 


manual capability, the data of Groups 2 and 
3 were compared on odd numbered days, and 
the data of Groups 1 and 2 on even numbered 
days. Given that Groups 2 and 3 were equiv- 
alent in performance on Day 1, Group 2 may 
be considered a subsample drawn from the 
same population as Group 3, and the perform- 
ance of Group 2 during the cold exposure 
days may be considered representative of 
Group 3. Analyses of variance supported the 
comparability of Groups 2 and 3 on the odd 
numbered days (F =.150; df=1/90) and the 
comparability of Groups 1 and 2 on even 
numbered days (F = .022; df = 1/90). Dif- 
ferences in the performance of Groups 1 and 
3 may, therefore, be attributed to the present 
exposure conditions and not to 
errors. 


sampling 


Week 3 (Days 11-15) 


During the third week of the study, reliable 
differences existed between all groups each 
day, with the exception of Groups 1 and 2 on 
Days 12 and 14 (F = 11.00; df = 2/81; p 
< .005) (see Figure 1). In this case, however, 
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no significant trials effect was found (F = 
874; df =2/81) and no groups by trials inter- 
action (F = .022; df = 4/81). Highly skilled 
subjects, well beyond the point where con- 
tinued training was producing significant im- 
provement in performance, were still exhibit- 
ing performance decrements in the cold. It 
is interesting to note, however, that the rela- 
tive size of the decrement appeared to vary 
with the temporal position of cold trials in 
learning. If cold experience had been re- 
ceived early in learning (as with Groups 1 and 
2), the hindrance to performance was found 
to be less than if these trials had been re- 
ceived later (Group 3). Particularly illustra- 
tive of this finding is the comparison of the 
superior performance of Group 2 to Group 
3 when each had had an equally large number 
of cold-hand trials (Day 10 for Group 2 and 
Day 15 for Group 3). Similar comparisons 
made earlier in cold training are less satis- 
factory because of the desirability of making 
the comparison at a time when the practice 
variable (absolute number of trials or days) 
was least effective, i.e., at the latest possible 
time in training. 

It is suggested as an explanation of the 


need for cold experience early in training 
that the thermal conditions usually considered 
boundary to a performance task may become 
part of the stimulus complex that elicits the 


correct manual response when these condi- 
tions are maintained for a prolonged number 
of trials. If so, the subject should learn, not 
merely to perform on a given manual task, but 
to perform with warm (or cold) hands spe- 
cifically; altering the temperature conditions 
associated with performance should disrupt 
responding, regardless of the direction of the 
change up or down the temperature scale. The 
usefulness of this explanation is well il- 
lustrated in the discussion of the warm-up 
data of Group 1 on Days 6 and 11 of the 
study. 


Warm-up data (Days 6 and 11) 


The warm-up data for the three groups are 
represented in Figure 1 as the points on the 
vertical dashed line intervening between Days 
5 and 6 and Days 10 and 11. The abscissa 
points corresponding to these days were re- 
served for the experimental training data. 
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The slight impairments shown in the warm- 
up performance of Groups 2 and 3 are ex- 
pected since they merely represent the decay 
in manual capability over the 2-day week- 
ends, but the similar impairment in the 
performance of Group 1 would normally be 
quite unexpected. Certainly the subjects of 
Group 1 needed warm-up trials too, but if 
warm-up were all they needed, their perform- 
ance should have been approximately equiva- 
lent to the performance of Groups 2 and 3. 
A series of ¢ tests indicated this not to be 
the case. The warm-up performance of the 
Group 1 subjects was statistically equivalent 
to their cold-hand performance on the pre- 
ceding Fridays, and generally poorer than the 
warm-up performance of the subjects of 
Groups 2 and 3. 

There appeared to be three sets of factors 
influencing the warm-up scores of Group 1: 
(a) the inhibition resulting from 2 days with- 
out practice, (4) the mechanical facilitation 
associated with warm flexible hands, and (c) 
the inhibition produced by changing the 
thermal aspect of the stimulus complex elicit- 
ing the manual responses. Since the first two 
factors should have been common to all 
three groups, it is concluded that the extra 
difficulty experienced by the Group 1 sub- 
jects during their warm-up trials on Days 6 
and 11 of the experiment was related to 
Factor c, altered stimulus conditions. Thus, 
varied thermal experience may be needed 
early in training not only to insure minimal 
effects of cold exposure, but to insure accept- 
able performance under what are usually 
considered optimally warm temperature con- 
conditions. 
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The purpose of this study was to better comprehend underlying, discriminating 
personal characteristics represented in a population of 418 petroleum research 


scientists. 5 factors were extracted 


Irom a 


matrix of 75 discriminating life 


history items and 3 criteria of research performance. The factors were tenta- 


tively identified as Favorable Self-Perc 


p 
I 


tion, Inquisitive, Professional Orienta- 


tion, Utilitarian Drive, Tolerance for Ambiguity, and General Adjustment. Pro- 
files of the 3 criterion groups across thi 


tween the profiles based on ratings, 
and the patent disclosures profil 


yut 


5 factors revealed great similarity be- 
substantial differences between these 


The observed differences were interpreted in 


terms of their implications for distinctive personnel policies. 


The empirically scored life history has 
frequently been identified with a “shotgun” 
approach, the absence of generality, a failure 
to ferret out logical relationships inherent in 
the data, and with “blind empiricism” (Dun- 
nette, 1962; Guilford, 1959, pp. 189-190). 
The matter is well summarized by Dunnette 
who says, “Most users of the method have 
been more intent on achieving statistical pre- 
diction than on gaining an understanding of 
the dynamics of success which may be sug- 
gested by the data.” In a methodological sense 
the present study is, in part, an attempt to 
take account of these justifiable criticisms. 
It is, additionally, an attempt to get behind 
a job label “research scientist” to some sig- 
nificant aspects of performance; in this case 
good output versus creativity, and to under- 
stand the distinction between them in life 
history terms. Again Dunnette has sum- 
marized the issue nicely under the heading of 
subgrouping analysis. 

In this perspective, the specific purposes 
of the present study were two-fold: first, to 
cluster or otherwise classify 75 priorly vali- 
dated life history items with a view to better 
comprehending the underlying personal char- 
acteristics represented; and second, using 
these characteristics as dimensions, to 
amine the differential profiles of three crite- 

1 Now employed Mead 
Chillicothe, Ohio 


ex- 


by the Corporation, 
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rion groups and their implications for in- 
dustrial personnel policies. 


METHOD 


In overview, the method employed was factorial 
A matrix consisting of criterion-keyed life his- 
tory items plus three criterion variables was fac- 
tored by the method of principal components. The 
factors identified were subsequently rotated em- 
ploying an orthogonal, normalized varimax program 
(Kaiser, 1958). The criterion profiles fol- 
lowed were drawn in terms of the 
of the criterion variables on 
identified 


75 


which 
factor loadings 
the dimensions 


As detailed in an earlier paper (Smith, Albright, 
Glennon, & Owens, 1961), the population of sub- 
jects for this study consisted of 418 petroleum 
research scientists employed by a large midwestern 
oil company. However, the number of subjects 
actually utilized in the sample varied from 362 to 
187 by virtue of the fact that completion of the 
484-item life history form was on a_ voluntary 
basis, that responses to particular items were some- 
times omitted, and that there was 
personnel attrition variously caused 
et al., 1961). 

he three criteria employed consisted of super- 
visory ratings made separately on creativity and 
overall performance and of the number of patent 
disclosures submitted during a 5-year period 


actual 
Smith 


some 


(see 


‘he personal history items included in the factor 
analysis contained 75 options which had _ been 
previously found to discriminate on one or more of 
the criterion variables at the .05 level or beyond 
The analysis itself was performed on the IBM 
704 utilizing a 78-variable program obtained from 
the Arizona State University (Wexler, 1960). 


5 


5 
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RESULTS 

Before proceeding to the main body of the 
results it may add clarity to note the relation- 
ships among the several criteria. Thus, the 
two rating criteria, overall performance and 
creativity, correlated .54. The correlations 
between these and the patent disclosures 
criterion were .14 and .18, respectively.” 

In the principal components analysis five 
factors, accounting for 23% of the variance, 
were extracted from the matrix. Each factor 
contained from 5 to 10 variables with factor 
loadings of .30 or greater. In part, these re- 
sults confirmed an expectation that life his- 
tory data would reveal considerable unique- 
ness. On the other hand there do appear to 
be some underlying characteristics which 
cause the data to cluster in meaningful and 
apparently significant factors. 


Factor Interpretations 


Favorable Self-Perception was the name 
assigned to the largest factor, accounting for 
6.3% of the total variance. People who ob- 
tain high scores on this factor receive high 
ratings on performance and creativity and 
characterize themselves as follows (rotated 
factor loadings appear in parentheses) : 

1. In the top 5% 
tion (.62) 

2. Could be highly successful supervisors if given 
the opportunity (.58) 

3. Work at a faster pace than most people (.52) 

4. Desire to work entirely autonomously, selecting 
both a method and a goal (.47). This item is 
substantiated by the fact that they like a lot of 
responsibility (.44), and are willing to take a risk 
(.31), and to handle many things simultaneously 
R.Ja) 

5. In an unpleasant situation try to 
formulate a decision immediately (.44) 

6. Are friendly and easy-going and have many 
friends (.32) 


of performance in their occupa- 


react and 


These characteristics appear applicable to 
an individual who preceives himself as suc- 
cessful in school, business, and interpersonal 
relations. Secondary loadings also indicate a 
high energy and drive level which may 
result from favorable self-perception. The in- 
dividual wants to progress and feels that he 

2 These correlations differ insignificantly from 
those reported by Smith et al. (1961) due to the 
use of criterion-keyed data in a different computer 
program. 


has taken advantage of opportunities pre- 
seted to him. 

The second factor, which accounts for 4.9% 
of the variance, has been designated as In- 
quisitive, Professional Orientation. Individuals 
who obtain high this factor are 
high in patent disclosures and describe them- 
selves as follows: 


scores on 


1. Completed their PhDs (.69) 
2. Belong to one or more professional organiza 
tions (.52) 

3. Do not limit themselves to undergraduate work 
as an educational goal for their sons (.45) 

4. Have some close friends number of 
acquaintances (.42) 

5. Devote much time to reading of many kinds 
(.38), preferring current and political topics among 
nonprofessional areas (.31) 

6. Desire to work entirely autonomously (.35) 

7. Have high salary aspirations (.31) 


and a 


These subjects are reasonably highly rated 
on performance and creativity and tend to 
have a professional, intellectual orientation 
as seen in secondary factors. They are less 
flexible than low scoring people, and prefer 
to concentrate on a single project at a time. 
There tends to be a shaping of their profes- 
sional, research desires or interests to some 
of the demands of an industrial society. 

People who obtain high scores on the third 
factor, Utilitarian Drive, are rated highly 
in performance and creativity and characterize 
themselves as follows: 


1. Desire extrinsic rewards, i.e., from business and 
society (.46) 

2. Prefer urban dwelling (.46) 

3. Started dating prior to age 20 (.41) 

4. Feel free to express their views and perceive 
themselves as influencing others in group and in- 
dividual situations (.37), do not want to work with 
just one other person (.32) 

5. Do not desire to work entirely autonomously; 
want to choose their own method, but not neces- 
sarily the goal toward which they are to strive (.33) 

6. Feel dissatisfied with themselves at times (.31) 


These individuals appear to be socially 
oriented and extraverted in this sense; but a 
secondary loading indicates that they are not 
really friendly, easy-going, and possessed of 
a lot of friends. Rather, people seem to be 
the medium which yields them both rewards 
and evaluations of their performance and 
which must, therefore, be integrated into their 
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Fic. 1. Profiles of criterion factor loadings on each of five dimensions identified 


way of life. This factor accounts for 4.0% 
of the variance. 

Factor 4 might be termed Tolerance for 
Ambiguity and accounts for 3.9% of the 
variance. People scoring high on this factor 
stand high in patent disclosures and de- 
scribe themselves as follows: 


1. Desire to have many work projects going simul- 
taneously (.54) 

2. Are not single (.50) 

3. Have solicited funds for 
make speeches (.38) 

4. Have high salary aspirations (.33) 

5. Have friends with 
political views (.34) 


charity (.43), and 


similar and_ dissimilar 


Secondary factor loadings help indicate that 
these individuals do adapt to the social 
climate of industry. They seek new ideas 
rather than leadership opportunities but ap- 
pear to accept the requirements of having 
friends and associates in the industrial and 
community environment. 

The fifth factor has been tentatively identi- 
fied as General Adjustment. High scoring in- 
dividuals consider themselves as follows: 


1. Feel that 
presented (.78) 


school material was adequately 


homes. wher 


2. Came from 
well treated (.38) 


h ipps they were 


3. Express their opinions readily and feel 
they are effective in doing so (.34) 

4. Have high salary aspirations (.34) 

5. Can inefficiency in a job better than 
other, less controllable problems (.30) 


6. Prefer verbal to laboratory course work (.30) 


that 


tolerate 


Both rating criteria appear with positive 
secondary factor loadings. The interpretation 
of the primary factor loadings appears to be 
substantiated by items which indicate willing- 
ness to assume responsibility and a desire to 
solve problems without going to a_ higher 
authority for “directed” answers. 

Finally, in Figure 1 the three criteria have 
been plotted or profiled simultaneously on 
coordinates of their factor loadings, vertically, 
versus the five factorial 
zontally. 


dimensions, hori- 


DISCUSSION 


This simple, graphic presentation of re- 
sults suggests at least two implications of 
moment. First, the “face validity” of the 
researcher rated highly on overall performance 
is probably greater than that of his more 
creative counterpart. Such traits as Favorable 
Self-Perception and Utilitarian Drive are at 





284 


least partially observable and very saleable, 
whereas Inquisitive, Professional Orientation, 
and Tolerance for Ambiguity are neither di- 
rectly the former nor certainly the latter. It 
thus seems probable that, when a quasisub- 
jective method is employed, the truly creative 
person will receive a less favorable relative 
evaluation than he merits. 

Second, these same contrasting character- 
istics seem also to argue for differential treat- 
ment or handling of technical personnel. For 
example, Favorable Self-Perception is at least 
partially dependent upon the favorable per- 
ception of such significant others as a super- 
visor; and Utilitarian Drive feeds upon a 
knowledge of problems considered important 
by one’s organization and of rewards for their 
solution. Conversely, an Inquisitive, Profes- 
sional Orientation is satisfied by attacking 
problems that an individual and/or his profes- 
sional colleagues consider important; and 
Tolerance for Ambiguity argues that prob- 
lems need not be, and perhaps should not be, 
clearly structured and nicely pre-evaluated 
to be intriguing. 

Implications of these data for criterion 
analysis are obvious and require little com- 
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ment, although one may be tempted to ob- 
serve again that a rating is a rating and no 
great respector of trait names. 

Overall it is felt that the most important 
outcome of the present study may reside in 
the implicit demonstration that much of the 
information necessary to formulate sensitive 


and definitive personnel policies can be 


derived directly from the characteristics of 
those to whom the policies will apply. 
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THE VALIDITY OF A METHOD FOR SCORING SENTENCE- 
COMPLETION RESPONSES FOR ANXIETY, 
DEPENDENCY, AND HOSTILITY * 
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A scoring-by-example manual was developed for use with the Rotter In- 
complete Sentence Blank for scoring 3 variables. The aim was to provide an 
objective scoring technique for the variables and to examine validity and 
discriminant scorer reliability by means of a multitrait-multimethod matrix. 
Validity data were obtained from a college population, the criterion being 
peer reputation measures of the variables. Self-descriptions provided a minimum 
competitive standard against which to judge the value of sentence-completion 
scoring. Discriminant scorer reliability was demonstrated and internal con- 
sistency reliability examined. The validity coefficients were modest but 
promising and provide a background against which future multitrait-multi- 
method examinations of other tests and keys can be evaluated. 


Among the projective methods, the sen- nique is used as a gross screening device, they 


tence-completion technique appears to possess 
certain advantages. The economy in adminis- 
tration, amenability to group testing and 
quantitative scoring, and flexibility in con- 
struction are particularly prominent. In the 
majority of validity studies a criterion of 
general “adjustment” (e.g., Carter, 1947; 
Rotter, Rafferty, & Schachtitz, 1949; Sacks, 
1949) or some global measures of psychiatric 
disturbance has been employed. While such 
criteria have obvious utility when the tech- 


1This study was supported by United States 
Public Health Service Grant M1544 to Northwestern 
University. 

The scoring keys were developed by K. E. Renner 
under the direction of B. A. Maher. 

The detailed scoring manual providing examples 


for each stem is available from the authors or 
from the American Documentation Institute. Order 
Document No. 7178 from ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress; Washington, 25, D. C., remitting in ad- 
vance $2.50 for microfilm or $6.25 for photocopies. 
Make checks payable to: Chief, Photoduplication 
Service, Library of Congress 

All ISB scorings were done by the senior author 
and by R. E. Walker, whose assistance is gratefully 
acknowledged. 

2 Department of Social Relations. 


are of lesser relevance to the psychologist 
who wishes to make inferences about more 
specific aspects of personality. 

This paper reports validity data for a 
method of scoring sentence-completions for 
three specific personality variables: anxiety, 
dependency, and hostility. The particular 
form of sentence completion used is the 
Rotter Incomplete Sentence Blank (Rotter & 
Rafferty, 1950). 

The three variables selected were thought 
to be of general clinical relevance, as esti- 
mates of their functioning form a common 
problem in clinical diagnosis. While the pri- 
mary aim was to provide scoring keys for 
the estimation of these variables from a 
sentence-completion protocol, a secondary 
goal was the establishing of a modus operandi 
which might be applicable by extension to 
other variables. One of these variables, 
dependency, has already been investigated 
extensively by Rotter and his students (Blyth, 
1953; Dunlap, 1951; Fitzgerald, 1958; 
Naylor, 1955). These investigators used a 
definition of dependency which was developed 
within the framework of Rotter’s social learn- 
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ing theory (Rotter, 1954); this definition has 
been considerably modified in the present 
study because it included many behaviors 
which do not possess face validity as measures 
of dependency as this is generally conceived. 
Thus, while the development of the scoring 
keys presented in this report owes much to 
the influence of this earlier work, the validity 
data which we report has only tangential 
relationship to Rotter’s theory. 


METHOD 
Development of the Scoring Keys 


In each case, the scoring keys were developed 
from an initial conceptual specification of the 
variable concerned. With this definition in mind, a 
series of behavioral referents was listed to provide 
operational representations of the variable. The 
definitions and some samples of behavioral referents 
follow: 

Anxiety. The anxious person is one who “worries,” 
expresses feelings of fear and apprehension, mani- 
fests symptoms of constant physiological readiness 
for action, fears, distractibility, and compulsive 
rituals. Referent behavior includes: 


1. Often has terrifying dreams 

2. Has phobias 

3. Lacks the ability to relax 

4. Is easily startled by a quick move or 
pected noise 


unex- 


Dependency. The dependent person is one who 
seeks the aid of another individual or group in 
preventing punishment or frustration, or in making 
satisfactions available. Referent behaviors include: 

. Likes security 

. Asks for help with work assignments 

. Asks aid of others in making dates, buying 
clothing, etc. 

4. Conforms 
and teachers 


readily to rules made by parents 


Hostility. The hostile person is one who expresses 
aggressive or hateful behavior or attitudes toward 
others, and reacts to minor frustration, social re- 
strictions, and other people with spiteful anger. 
Referent behaviors include: 


1. Frequently fights when provoked over insig 
nificant matters 

2. Dislikes people in general 

3. Expresses strong hatred for minority groups 

4. Attacks physical objects and other scapegoats 
easily 

In a preliminary study, 50 completed incomplete 
sentence blank (ISB) protocols of college students 
were drawn from a pool of 206 obtained in an 
elementary psychology Using these protocols 
as a reservoir, responses were abstracted which 
seemed relevant to the behavioral referents for 


class 
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any of the three variables. Each response was as- 
signed a weight of 2, 1, or 0, depending upon 
whether or not the experimenter judged that: (a) 
it included an overt and unqualified reference to 
one of the behavioral referents, (b) it gave a limited 
or an indirect reference to these, or (c) it was 
unrelated to any of the referents. These three 
categories were scored 2, 1, and 0, respectively. 
Examples of typical responses and the scores as- 
signed to them are given as follows 


Anxiety—I regret: 
that I am always tired and tense—2 
the thought of final examination week—1 
not having more time to do everything I want 
to do—O 
Dependency—I regret: 
having to be away from home 
making decisions—1 
that I have done 
have done—O 
Hostility—I regret: 
that I am continually losing my temper 
not being nice to my old roommate—1 
not being able to go to the beach every day—O 


This pool of responses was assembled into a 
scoring-by-example manual. The manual was then 
applied to an additional 96 completed ISB protocols, 
from the same pool, with the purpose of eliminating 
inconsistencies, increasing the 


some things I should not 


number of examples, 
and providing a set of scores for preliminary evalua- 
tion of scoring reliability. The end product was a 
manual which provided some 15-20 illustrations of 
scorings of completions for each stem for each of 
the three variables. A second 
scored separate samples of 20 protocols for each 
of the three variables, the samples being drawn 
from the scored pool of 96. To reduce the “halo” 
effect as much as possible, each item was scored 
on all 20 blanks before the rater proceeded to score 
the next item. The stack of blanks was shuffled 
between the scoring of each item. Following this 
procedure interrater reliability was computed for 
each of the variables by computing the product- 
moment correlation for raw scores. The correlations 
obtained were: Anxiety, 
Hostility, .94 

In the major validity 


rater independently 


88; Dependency, .83; and 
study, the description of 
which follows, each protocol was scored independ- 
ently by two raters, with similar precautions against 
a halo effect, and the score assigned to each sub- 
ject was the sum of the scores provided by the 
two raters. This sum is used in the main analysis. 


Sample and Ratings 


The sentence-completion blanks were administered 
to residents of a woman’s cooperative dormitory, 
and to residents of 
midwestern 


a men’s dormitory at separate 
universities. Collected at the same time 
were ratings by each subject of each other subject 
in the traits. The latter data were 
obtained for a larger study of projection (Campbell, 


group on 27 
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Miller, & Lubetsky, 1959). Among the 27 traits 
rated were anxiety, dependency, and hostility, which 
were defined to the subjects in the following terms: 
Anxious = worried; Dependent = not self-reliant, 
seeks the help of others; Hostile == has feelings of 
hate and anger towards others. It will be noted that 
these definitions were deliberately colloquial in form, 
as the intention was to use validity criteria based 
upon man-in-the-street judgments about peers 
rather than the more sophisticated ones which might 
be used by a psychologist. Each subject rated other 
subjects on a from 1 through 9 points, 9 
being the maximum presence of the trait. Mean 
reputation scores for each subject provide one of 
the validity criteria used in this investigation. The 
peer ratings are, of course, of recognized imper- 
fection and are not taken as defining operations 
for the traits in question. Rather, in the tradition of 
construct validity (Cronbach & Meehl, 1955) and 
the multitrait-multimethod matrix (Campbell & 
Fiske, 1959), they are introduced as another fallible 
effort to assess the same traits, but one having 
independent sources of imperfection or systematic 
error. The reliabilities of these measures based upon 
the correlations between odd 
Spearman-Brown corrected, are 
main diagonal of Tables 1 and 2. 

Also obtained Self-Descriptions. These are 
regarded not so much as providing another validity 
criterion but rather as a competitive minimum 
standard against which to compare the sentence- 
completion test. Of the three bodies of data, the 
reputation costly, 


scale 


versus even raters, 


presented in the 


were 


measures are the most most 


awkwardly obtained, and the least generally avail- 
able. The sentence completions and the self-ratings 


are both readily obtained, but the self-ratings are so 


much more economical of subject and scorer time that 
the sentence completion or any other projective test 
would scarcely be justified if it were no better than 
the simple self-ratings (e.g., see Campbell, 1960, p. 
549). The self-ratings were made by each subject 
using the same definitions as those used in the 
peer-rating task, and with the same 9-point scale. 
These were made upon two occasions 1 week apart, 
the sum of the two being used as the measure. 
The reliabilities reported in Tables 1 and 2 rep- 
resent the correlation between the two ratings with 
the Spearman-Brown correction. 


RESULTS AND DISCUSSION 


Sentence-Com pletion Validity. Tables 1 and 
2 present the intercorrelations among the 
three traits as measured by the three methods, 
arranged as a multitrait-multimethod matrix. 
If we look first at the Sentence Completion 
and Peer Reputation block, in Tables 1 and 2, 
we find some evidence of significant validity. 
All of the “high” values, above .20, are along 
the validity diagonal. The .43 value for Anx- 
iety in the women’s matrix is significant at 
the .01 level, and, in absolute magnitude, 
about as high as one can expect a validity 
value to be when the methods are as inde- 
pendent as these (e.g., see Campbell & Fiske, 
1959, for typical values). The validity values 
of .22 and .34 for Hostility become significant 
at the .01 level when averaged across the two 
sexes. The values for Dependency are exactly 


TABLE 1 


VALIDITY MATRIX FOR WOMEN 


Sentence Completion 
Ai 


Sentence Completion 
Anxiety Ay 
Dependency D; 
Hostility 


Peer Reputation 
Anxiety 
Dependency 
Hostility 


Self-Description 
Anxiety A; 34 
Dependency Ds; 39 
Hostility H; 10 

Note.—N =44, r >.28 p 


] 


<.05, r>.37 p<. 


Peer Reputation 


Self-Description 


De H, As Ds; Hs 
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TABLE 2 
VaLipiry MATRIX FOR MEN 


Sentence Completion 


Ai Hy 


Peer Reputation 


Self-Description 


As D; 





Sentence Completion 
Anxiety Ay 
Dependency D, 


Hostility H, (.92) 


Peer Reputation 
Anxiety 
Dependency 
Hostility 


Self-Description 
Anxiety A; 24 a5 
Dependency D; .26 37 
Hostility H; 06 -.11 


Note.—N =48, r>.28 p <.05, r >.37 p <.01. 


the opposite in direction from what was pre- 
dicted—but are, nonetheless, higher than any 
of the heterotrait-heteromethod values in the 
blocks. When averaged across sexes, the value 
becomes significant at p < .05 by a two-tailed 
test. In general, while the validity values are 
low, they rise above the heterotrait-hetero- 
method values that surround them, and are 
not lower, in general, than the monomethod- 
heterotrait values for Sentence Completion. 
The Sentence Completion scoring seems to 
have provided reasonably independent and to 
some extent discriminantly valid measures. 
The possibility that word fluency provided a 
strong irrelevant methods component in the 
sentence completion scores was excluded, the 
highest correlations being for the Anxiety 
scores, —.20 for women, —.12 for men. The 
other values with word fluency are: Hostility, 
.07, .17; Dependency, —.05, .09. (Word 
fluency was measured by the total number of 

3A possible explanation of the paradoxical nega- 
tive dependency values was that ISB scores in- 
cluded completions based on a rejection of own 
dependency (e.g., I regret: the way I always need 
help from others) which would correlate negatively 
with reputation. To check on this possibility a 
count was made of the relative frequency of the 
rejection type of completion among the illustra- 
tions provided in the scoring manual. Only 15 of 
the 378 examples were of this nature, showing such 
an explanation to be unlikely. 


words written by the subject on the ISB 
protocol.) 

Comparative Validity of Self-Descriptions. 
Comparing these validities with those of the 
Self-Description-Peer-Reputation block, the 
Sentence Completion test may be judged no 
worse, and possibly somewhat better. The 
Self-Description validities are relationally 
lower because surrounded by somewhat higher 
heterotrait-heteromethod values. However, 
compared with Sentence Completions, the 
Self-Descriptions are better (or more as ex- 
pected) for Female Dependency and are bet- 
ter for Male Anxiety. On the other hand, Self- 
Description Hostility is worthless for both 
sexes (it is actually more symptomatic of 
reputational Anxiety) whereas Sentence-Com- 
pletion Hostility has consistent validity. 

Rotter Adjustment and Dependency. The 
Sentence Completion keys were developed 
from Rotter’s test manual for general Ad- 
justment and an unpublished key developed 
by Rotter for Dependency. These two Rotter 
scorings were also made for the cases here 
studied (scored by only one judge). The Rot- 
ter Adjustment score correlates with the 
Maher-Renner scores of this study as fol- 
lows: Anxiety, female (f) = .57, male (m) 
= .80; Dependency, f = .39, m= .40; Hos- 
tility, f = .36, m = .40. The Rotter Depend- 
ency scoring provides these values: Anxiety, 
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f= .59, m=.57; Dependency, f = .59, m 
= £88; Hostility, f= .11, m= — .17. How- 
ever, even where the two correlates in the 
.80’s, the Rotter scorings fail to give as high 
validities as the Maher-Renner scores. The 
Rotter Dependency scores did not, on the 
other hand, provide the paradoxical negative 
values with reputational dependency. The 
obtained validities were f= .04, m= .05. 
Though more paradoxical, the Maher-Renner 
validities of —.23 and —.27 are judged more 
promising. 

Discriminant Reliability. The multitrait- 
multimethod matrix presentation is also ap- 
propriate for assessing the problem of “dis- 
criminant reliability,” that is, of assuring that 
even in cases of imperfect scorer agreement 
that there is essential common understanding 
and that common distinctions are being made. 
These problems were actively felt in develop- 
ing the present scoring key, where the same 
set of items were scored for all three traits, 
and where all three traits were common forms 
of maladjustment, often found in clinical 
practice as concomitants. To assess the scor- 
ing achievement in these regards Tables 3 
and 4 are presented in which each rater is 
treated as a separate “method.” It is clear, 
both in general and in detail, that the agree- 
ment is excellent, and that it is discriminat- 
ing. Note the almost total absence of any 
“methods variance” adhering to the ratings 
of one rater: The heterotrait-monomethod 
values are in general no higher than the 
heterotrait-heteromethod values. Note also 
the striking parallels in these values, particu- 
larly in the lower left hand triangles, for both 


TABLE 3 


CONVERGENT AND DISCRIMINANT SCORING 


AGREEMENT FOR WOMEN 


Rater 1 
Anxiety 
Dependency 


Hostility 

iter 2 

Anxiety As 
Dependency Dz 
Hostility H 


Note.—N =44, r >.28 p< 


TABLE 4 
CONVERGENT AND DISCRIMINANT SCORING 
AGREEMENT FOR MEN 


Rater 1 Rater 2 


As D: Hs 
Rater 1 

Anxiety Ai 

Dependency Dr: 

Hostility 
Rater 2 

Anxiety As 77 ‘ .36 

Dependency Dz 38 j .29 

Hostility Ha A7 =. 85 .24 


Note.—N =48, r>.28 p <.05, r >.37 p <.01. 


males and females. The one asymmetry is 
found for the upper triangles of the hetero- 
method block, where the AyD, values are sur- 
prisingly low for both sexes. To some extent, 
the two raters are differing in their concepts 
of anxiety and dependency, with Rater 1’s 
Anxiety being closer to both Rater 1’s and 
Rater 2’s concept of Dependency than is 
Rater 2’s Anxiety. The scoring manual can 
probably be improved to achieve greater uni- 
formity in this regard, although the com- 
munication of discrimination is judged to be 
excellent. 

Internal Consistency Reliability. The Sen- 
tence Completion reliability values reported 
in Tables 1 and 2 are the interrater agree- 
ment values from Tables 3 and 4, Spearman- 
Brown corrected to estimate the reliability of 
the sum of the two ratings. Such a reliability 
has some comparability to the reputation reli- 
abilities. In both cases, it is a matter of agree- 
ment between judges who have observed a 
heterogeneity of specific occasions and be- 
haviors for each subject. In both cases, two 
sources of unreliability can be conceptualized: 
On the one hand, there is interjudge disagree- 
ment or judge incompetence in coding the be- 
havior samples. On the other hand, there is 
the subject’s inconsistency in behavior from 
instance to instance. This latter source of un- 
reliability corresponds most closely to struc- 
tured-test internal-consistency reliability of 
the split-half or coefficient alpha (Cronbach, 
1951) type. For the reputational measures, 
there is no opportunity to measure this second 
component of subject inconsistency. Since the 
Sentence Completion involves a numerical 
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scoring on each of 40 items, such a reli- 
ability becomes possible. 

To achieve this, each subject was given a 
score for each trait on each item which was 
the sum of the two raters’ ratings. These were 
then used to compute a coefficient alpha 
(Cronbach, 1951) internal consistency reli- 
ability. While these values still reflect both 
components of unreliability, they accentuate 
the contribution of intrasubject sources such 
as subject inconsistency or content inhomo- 
geneity among the items. These values are: 
Anxiety, m= .71, f = .69; Dependency, m 
= .74, f = .36; Hostility, m = .54, f = — .02. 
While rarely reported for projectives, reli- 
ability values of this type are very relevant 
for the evaluation of projective tests. Al- 
though regarded as a lower-limit estimate of 
reliability, these values may be a more re- 
alistic estimate of the upper boundary for 
validity coefficients than interjudge agree- 
ment. The paradoxically low value for Fe- 
male Hostility is a clear exception, however, 
as this scoring provides “validity” correla- 
tions of .22 and .48, as shown in Table 1. 

Comments. The absolute values of the va- 
lidity coefficients are, of course, disappoint- 
ing. But we lack comparably rigorous ex- 
aminations for other projective tests and few 
enough for objective tests (e.g., see Campbell 
& Fiske, 1959). We judge these to be on the 
more promising side, or at least worth pub- 
lishing to provide a part of that background 
against which future examinations of other 
tests and scoring keys can be evaluated. 

A projective protocol often is used clini- 
cally to assess more than one variable, intro- 
ducing the possibility of a strong methods 
component as a function of the instrument 
or the scorer. The present data are encour- 
aging in that the ISB, as a projective tech- 
nique, can be used after reasonable precau- 
tions for discriminant evaluations of several 
variables. This source of possible contamina- 
tion often is overlooked in projective data, al- 
though open to evaluation in terms of a multi- 
trait-multimethod matrix. The paradoxical 
negative dependency data suggest that the 
projective estimate was tapping a motiva- 
tional level, and the reputational measure a 
behavioral level. The projective measure seems 
to reflect a subject who relies on family values 
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and protection, qualities that may keep him 
behaviorally “independent” of his peers; 
whereas, the projectively independent sub- 
ject, emancipated from family protection, is 
perhaps behaviorally more “dependent” on 
his peers. The speculation is that at least for 
dependency, the projective scores are reflect- 
ing a motivational state more than reflecting 
current peer group behavior. 
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SELF-CONFIDENCE AND LEADERSHIP 
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The object of this study was to examine the relationship between lack of 
confidence in one’s leadership ability and reliance upon passive leadership 
techniques to cope with supervisory problems. 77 Navy petty officers were 
given a questionnaire containing 20 supervisory problems and were asked to 
evaluate the desirability of each of 5 ways of solving each problem 
evaluated how satisfied they were with their leadership abilities 

findings were: (a) Ss were highly consistent in the extent to they 
endorsed each of 5 approaches to correcting performance, (b) there was a 
correlation of .52 between endorsing the use of administrative procedures to 
solve the problem and informally asking a superior to solve the problem, and 
(c) Ss who lacked confidence in their leadership abilities were significantly less 
willing to hold face-to-face discussions with subordinates and significantly more 
often endorsed both referring the subordinate to a superior and relying upon 


Ss also 
Principal 
which 


the use of administrative rules to solve the supervisory problems 


As Gibb (1961) has pointed out, a rather 
important component of effective leadership 
is how well the leader uses the powers in- 
herent in his role for directing followers. This 
observation would seem to be specially true 
in business and military organizations where 
supervisors are frequently chosen for their 
technical rather than leadership skills. Be- 
cause of this selection on other than leader- 
ship skills, many supervisors lack the charis- 
matic qualities needed to develop personal 
loyalties, and of necessity, must rely upon 
the application of the powers provided them 
by the organization to successfully direct 
subordinates’ performance. Thus it is of in- 
terest in hierarchical structures to investigate 
the repertoire of actions available to leaders 
and how this repertoire is used. 

Petty officers (POs) are the Navy’s first 
line supervisors. In a previous study (Kipnis, 
Lane, & Frankfurt, 1961) the repertoire of 
actions available to these POs for correcting 
subordinates’ behavior was investigated by 
analyzing POs descriptions of how they cor- 
rected subordinates’ performance. Among the 
results was the finding that chief petty of- 
ficers (CPOs), who are the most 
listed supervisors, used direct 
actions to correct subordinates 


senior en- 
face-to-face 
more fre- 
1The views expressed here do not 
represent those of the United States Navy 
The authors wish to thank Albert S. Glickman for 
his helpful suggestions and criticisms. 


necessarily 


quently than did second class petty officers 
(PO2s). The PO2s relied on the more pas- 
sive actions of either asking the help of a 
superior in dealing with the subordinate or 
placed the subordinate on report. Placing a 
subordinate on report is a formal disciplinary 
procedure that may lead to judicial action 
against the subordinate. At a minimum, it 
requires the commanding officer to decide 
how to deal with the subordinate. 

There was, of course, considerable overlap 
in the corrective actions taken by CPOs and 
PO2s. Not all PO2s placed a man on report 
or referred him unofficially to a superior, and 
not all CPOs took direct face-to-face correc- 
tive actions. This finding suggested that it 
might not be simply variations in authority 
and power which accounted for the differences 
between CPOs and PQ2s, although clearly 
the factors involved were associated with these 
differences. 

The present study examined whether a 
PO’s confidence in his leadership ability was 
related to his reliance upon passive or non- 
passive leadership techniques to cope with 
supervisory problems. Since self-confidence 
tends to be correlated with amount of experi- 
ence and authority status, one would expect 
junior POs to be less confident than CPOs in 
their leadership capabilities. And since there 
is some experimental evidence (Hochbaum, 
1954) that low degrees of self-confidence lead 
to more reliance upon the opinions of others, 
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it seems reasonable that junior POs’ inclina- 
tions to rely upon other persons, officially or 
unofficially, for help in solving their super- 
visory problems may be attributed to this 
variable. 


PROCEDURE 


Correction of Performance Questionnaire. The Cor- 
rection of Performance Questionnaire (COP) was 
based upon incidents gathered in earlier research 
(Kipnis et al., 1961) in which POs described how 
they corrected subordinates’ performance. Twenty 
incidents which described substandard behaviors of 
enlisted men were used. Following each incident, 
five alternatives were offered, each describing a 
different way of handling the problem. 

Alternative “A” advocated talking to the sub- 
ordinate to find out why he acted as he did; 
Alternative “B” advocated reprimanding the man; 
“C” advocated the use of sanctions such as extra 
instruction, denial of privileges, and inspection; “D” 
advocated informal referral of the problem to a 
superior for action; “E” advocated officially placing 
the subordinate on report. 

The POs frequently use more than one corrective 
action in dealing with a subordinate’s performance. 
Hence, it would have been unrealistic to restrict 
the number of alternatives POs could choose to 
handle the problem. Accordingly, the instructions 
required respondents to rate each alternative on a 
five-point scale, ranging from “A Highly Desirable 
Action” (HD) to a “Very Undesirable Action” (VU). 
A sample item follows: 

1. A seaman refused to take suggestions on how 
work should be performed. When given a direct 
order, the man became angry and answered back, 
although he did not refuse te do the work. 


a. Talk with the man to 
find out the cause of his 


aD De? Uv WW 
actions. 


b. Warn the man thathe HD D 
will be reported for disre 
spect. 

c. Deprive the man of HD D 


special privileges for a week. 


d. Ask the Division Oficer HD D 
to handle the man. 


e. Placethemanonreport HD D 
for disrespect and failure to 
follow instructions. 


Weights of 5, 4, 3, 2, 1 were assigned to the scale 
(with HD=S5). Five scores were derived from the 
questionnaire, each with a possible range of 20 to 
100: Diagnostic Talk, Reprimand, Sanctions, Re- 
ferral, and Report. 
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TABLE 1 
Spiit-HALF RELIABILITIES OF 
(N = 77) 


THE COP 


COP scores 


Reliability 


Diagnostic Talk 91 
Reprimand 82 
Sanctions 84 
Referral 89 
Report .90 


Self-Confidence Questionnaire. Respondents were 
also asked to indicate how satisfied they were with 
their own performance in each of seven leadership 
skills: 


. Supervise your work crew’s technical work. 

. Help your men prepare for advancement. 

. Correct lax or unmilitary behavior in your men. 

. Get your men to work productively on hard 
duty. 

. Criticize 
your men. 

. Develop favorable attitudes toward the Navy 
among your men. 

7. Get your decisions backed up by your superior. 


constructively poor performance of 


The five-point scale ranged from “Very Satisfied,” 
weighted 5, to “Not Satisfied at All,” with a weight 
of 1. A Self-Confidence score for each man was 
obtained by summing weights over all seven items 
for a possible range of 7 to 35. 

Sample. The questionnaires were administered 
prior to training at three Navy Petty Officer Leader- 
ship Schools. Seventy-seven usable questionnaires 
were obtained from 27 CPOs, 44 POIs, and 6 PO2s. 
The POls are intermediate in supervisory rank be- 
tween CPOs and PO2s. 

The sample was highly selected, since the policy 
of the Navy is to assign only POs who show above 
average leadership abilities to these schools. It is 
expected that the POs, upon completion of training, 
will help set up leadership training courses at their 
next duty stations. 

Questionnaire Reliability. Table 1 gives the cor- 
rected split-half reliabilities of the five scores of the 
COP. These ranged from .82 to .91. The previous 
research (Kipnis et al., 1961) found POs were con- 
sistent over several leadership problems in whether 
or not they would talk to a subordinate about his 
poor performance, would refer him to a superior, or 
would place him on report. The reliabilities reported 
in Table 1 tend to confirm those findings of in- 
dividual consistency in choice of leadership actions. 

The corrected split-half reliability of the Self- 
Confidence scale was .81. 


RESULTS 


The intercorrelations among the five scores 
of the COP are shown in Table 2. There was 
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TABLE 2 
INTERCORRELATIONS BETWEEN THE COP Scores 


COP scores 
. Diagnostic Talk 
. Reprimand 
. Sanctions 
. Referral 
. Report 


a fair degree of independence among the five 
scores, given that they were derived from the 
same instrument. The highest correlation was 
between the Referral and Report scores (r = 
.52, p< .01), suggesting that informally asking 
for help from a supervisor as well as formally 


TABLE 3 
DISTRIBUTION OF SELF-CONFIDENCE SCORES 
(By rank) 


Rank 


Chiefs PO1s + PO2s 
(N = 27) (N = 50) 
Self-Confidence scores % % 


High third 41 
Middle third 33 
Low third 26 


invoking administrative rules to solve one’s 
problems may share a common psychological 
basis. It is also interesting to note that there 
was a negative correlation between Diagnostic 
Talk and Report (r = — .26, p < .05), sug- 


TABLE 4 


MEANS AND SUMMARY OF ANALYSIS OF 


Mean COP scores 
Self-Confidence Rank 


Chief PO1 + PO2 
Average-high 89.65 86.52 
Low 81.00 82.56 


Reprimand 
PO1 + PO2 
Average-high 76.11 
Low 75.30 


Sanctions 
Chief PO1 + PO2 
Average-high 69.60 69.00 
Low 71.43 71.74 


Referral 
Chief 
Average-high 47.55 
Low 54.00 


Report 
PO1 + PO2 
Average-high b 46.22 
Low 53. 50.39 


Note.—Average-high Self-Confidence: Chiefs =20, PO1+PO 
5 


*p<.05. 


VARIANCE OF THE Five COP Scores 


Analysis of variance 


Source 


Diagnostic Talk 


Rank 
Confidence 
Cx 


Error 


Rank 
Confidence 
CXR 
Error 


Rank 
Confidence 
CXR 
Errot 


Rank 
Confidence 
cx 
Error 


Rank R > 02 
Confidence , 6.38* 
ae 1.38 
Error 


2=27. Low Self-Confidence: Chiefs =7, PO1+PO2 =23 
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gesting that the greater the willingness of the 
PO to actively discuss the problem with his 
subordinate, the less likely he was to rely on 
administrative rules to solve the problem. 

Self-Confidence scores were divided into 
approximately equal thirds and the percent- 
ages of CPOs and POIs plus PO2s falling into 
each third are shown in Table 3. 

As was expected, more CPOs expressed con- 
fidence in their leadership skills than did 
POIs + PO2s. Thus, 26% of the CPOs and 
46% of the POls + PO2s were classified in 
the low third of Self-Confidence scores.* While 
the direction of the differences was consistent 
with our original hypothesis, a chi square test 
of these differences was not significant (,° 
3.20, df = 2, p < .30). Since the POs in our 
sample were originally highly selected in 
terms of their leadership skills, it seems likely 
that in a less selected sample, the trends ap- 
parent here would reach customary levels of 
significance. 

The relationship of Self-Confidence with 
the five COP scores was determined through 
analysis of variance. Since the earlier work 
(Kipnis et al., 1961) had found a relation- 
ship between supervisory status and leader- 
ship practices, a factorial design was used 
including both Self-Confidence scores (divided 
into upper two-thirds and lower one-third) and 
Rank (CPOs versus POls + PO2s) as inde- 
pendent variables. To take into account the 
unequal numbers in the various cells, a cor- 
rection proposed by Walker and Lev (1953, 
pp. 381-382) was used. 

Table 4 presents the results of the analysis. 
It can be seen that those POs classified as 
low in Self-Confidence were significantly less 
willing to hold diagnostic talks with the sub- 
ordinate (p< .05), and significantly more 
often endorsed both referring the subordinate 
to a superior (p < .05) and placing the sub- 
ordinate on report (p < .05). 

On the other hand it is apparent that, 
within the limited range of CPOs, POIs, and 
PO2s studied here, PO status alone had no 
specific effect on COP scores. The few CPOs 
who were categorized as low in Self-Confi- 


2While not analyzed separately 
small numbers, it is worth noting that none of the 
6 PO2s were classified in the high third Self- 
Confidence scores. 


because of the 


of 


WILLIAM P. LANE 


dence had the same pattern of COP scores as 
the more junior POs who were so classified. 
DIscUSSION 

The results indicate that doubts about one’s 
leadership abilities lead to what might loosely 
be called a “buckpassing” approach to solving 
leadership problems. These results tie in 
nicely with the previous findings that junior 
POs exercised less direct leadership and placed 
more reliance upon unofficial referrals to 
superiors or upon placing a subordinate on 
report. The suggested relationship seems to 
be: inexperience contributes to lack of confi- 
dence, which in turn leads to reliance 
others for solving leadership problems. 

It would seem that effective leadership 
problem solving in a hierarchical organiza- 
tion has at least two components. First, the 
supervisor must be able to intelligently select 
the appropriate corrective actions from the 
repertoire of behaviors available to him. In 
the previous research (Kipnis et al., 1961), 


on 


it was found that the supervisor system- 
atically varied the kinds of corrective actions 
used as the problems presented by the sub- 
ordinate varied. Secondly, the supervisor must 


be willing to take action to solve the prob- 
lem and not get in the habit of placing the 
problem in someone else’s lap. The importance 
of this motivational variable has been sug- 
gested in several reports. For example, Havron 
and McGrath (1961) reported that effective 
leaders were characterized by a “willingness 
to act” when a problem situation arose. Gibb 
(1961) has done work on the ability of the 
leader to use the repertoire of behaviors he 
has at his command, and Clark, Spector, and 
Glickman (1960) identified an effective Navy 
CPO as one who made full use of his super- 
visory powers. 

It is evident that there are other factors 
besides self-confidence and experience that are 
related to a willingness to take direct leader- 
ship actions. For example, span of control of 
the supervisor may be important. We found 
that POs who supervised large of 
men relied more on referral and report and 
less on direct supervision than did POs super- 
vising smaller groups. 

The correlations between the Referral and 
Report scores suggests that unofficially asking 


groups 
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for help and advice from a superior and 
reliance upon administrative regulations share 
common psychological elements. Both of these 
actions seem to be part of the bureaucratic 
buckpassing syndrome; serving as a means for 
the individual leader to avoid taking personal 
action and making difficult decisions. 

While psychologically these two actions 
may share common elements, administratively 
the consequences of reliance upon regulations 
are quite different from asking for advice in- 
formally. For instance, it raises the question 
as to whether subordinates may be subjected 
to disciplinary action because they are by 
unhappy chance subject to the supervision of 
an inadequate leader. Invoking administrative 
regulations as a substitute for personal action 
burdens higher executive with 


also levels 


decisions that should have been made at a 
much lower echelon. Obviously all levels of 
an organization suffer to the extent that super- 
visors too frequently rely on this mode of 
supervision. 
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Recently an article was published by 
Merenda and Clarke (Merenda & Clarke, 
1959) which appears to us to be an example 
of “colored” reporting of a research study. 
It is extremely unlikely, of course, that 
Merenda and Clarke were motivated by any 
deliberate intent to deceive the reader; even 
so we believe the net effect of their report is 
to lead the reader to inappropriate and errone- 
ous conclusions. Our article is therefore, a 
critique of their research report. Our purpose 
is to call attention to poor reporting of data, 
should not 
be done in the publication of psychological 
research 


and to illustrate in general what 


To illustrate 


comings 


some of the report’s short- 
let us imagine we are listing a series 
of techniques which produce distortions in 


interpretation of data, many of which appear 


in the article by Merenda and Clarke. Their 


study carried out for the 
evaluating the predictive validities of the 
Activity Vector Analysis (AVA) (Baxter, 
1959; Bennett, 1959) and personal history 
variables 


was purpose of 


as selectors of life insurance sales 
men. Using a criterion combining tenure and 
sales production, they investigated AVA along 
with various personal history factors and used 
Discriminant Analysis to 
weights to maximally 


salesmen from 


derive scoring 
successful 
Their 
AVA 


personal history questionnaire were 


separate 
unsuccessful 
a predictive one because the 


salesmen. 
study is 
and a 
idministered to applicants for sales positions; 
both instruments were used as aids in making 
(21%) 
of 522 applicants hired were regarded as suc- 
cessful at the end of The results of 


the 3-year follow-up study are, 


hiring decisions. Even so, only 108 
3 years. 
in our opin- 
ion, reported in such a way as to lead even a 
sophisticated reader to misleading and errone- 
ous conclusions 


Several of the techniques used which 
lead the reader are listed below: 


Technique I 


Emphasize statistical significance when the 
data are based on large samples and the 
amount of overlapping between distributions 
is large. 

Merenda and Clarke give information based 
on AVA profile scores and personal history 
variables for 108 successful and 414 un- 
successful salesmen. Mean differences be 
tween these two groups are significant statis 
tically for eight of the nine variables shown in 
their tables. Correlations estimated for the 
same data range from —.18 (AVA Social 
Adaptability) to +.19 (Monthly Living Ex- 
penses). Without regard to sign, the median 
coefficient is .14. An estimate of the validity 
of the four AVA variables for predicting suc- 
cess among these salesmen is given by the 
multiple correlation coefficient of R = .21 
computed by us from data given in the 
Merenda and Clark article and that shown 
in Table 1 obtained from the American Docu- 
mentation Institute (ADI). In order to assess 
the practical significance of these correlations 
Tilton’s (1937) overlap measure was applied 
The overlap statistic yields values ranging 
from a high of 95% (Number of Offices 
Held) to a low of 85% (Monthly Living Ex 
penses) with a median value of 899%. This 
means that a minimum of 85% of the scores 
of the successful salesmen are duplicated by 
scores of the unsuccessful salesmen. Evidence 
of this kind is damaging to the practical 
validity of a selection device. According to 
Technique I, such evidence should not be 
reported. (To their credit, Merenda and 
Clarke do report the correlations obtained 
but they fail to discuss the implications of 
their low values.) Such data give the reader 
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rABLE 1 


RESULTANT 
Poral 


CORRELATION MATRIX O} 


VARIABLI FOR SAMPLI 


CN 


AVA Aggressivenes 

AVA Sociabilits 

AVA Emotional Control 
AVA Social Adaptability 
Number of Childret 
Educational Level 
Offices Held 

Living Expenses 


Insurance Owned 


a more realistic view of results and need to 
be discussed along with ¢ tests and probability 
Minor differences can be statistically 
when the Vs are large 
practical significance is lacking. 


levels. 


significant even when 


Technique Il 


Refer to overlapping data or magnitudes of 
correlation coefficients only when it is to your 
advantage to do so. 

Data for 15 other personal history variables 
which were investigated were not reported. 
Merenda and Clarke (1959) state: 


the 
unsuc¢ 


per 


other fifteen I 
distributions for 


For the sonal history 


Irequency 


measures, 
successful and 
compk tely 
and 1 
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cessful agents were either nearly over- 


lapping or else the differences in means 
not statistically p 


iriance 


were significant 


We for the 


discussed in the preceding section 


would have said the same data 


Later, in referring to data deposited with 


ADI, Merenda and Clarke state: 


The 
history 
pe variables 
each On the 
decided to employ 
dependent 
cut-off 


rsonal 
the 
incorrelated with 


data showed that the p 
not only independent of 
but e also 
basis of these 

these two predicto1 sets as 
batteries ch with its own 


(p 362) 


correlational 
measures are 
onality 
other findings, it was 
in 
minimum 


score 
Now, 
seems important. In this instance, Merenda 


and Clarke gave attention to the actual mag- 
nitude of this 


mere statistical significance no longer 


of correlations. Examination 


AVA 


5 


59) 


AND S 


ye Live 


table is of interest, 


Pec hniqt e Il] 
Tee hni jue I] 


If data confuse the reader or 
possibly 


him t 


. , . 
to lead o come to conciusion 


to those wish to state 
ADI. 

The data deposited with ADI by Merenda 
Table 1. It may 


ations 


opposed you depo il 


it with 


and Clarke are reproduced in 
be noted that of 


AVA and per 


) 


the corre betwee! 


scores sonal history variables 


seven are statistically significant at the 5 


level or better. This ratio (7 of 20) of si 


nificant correlations to the total calculated i 
slightly greater than that obtained betwee 
the AVA scores, the 2 
ables and effectivenes 
that 
lable ] 


between personal history variables and AVA 


\ personal history Vari 
24). W 


magnitude of 


sales (do Ol 


certainly agree the low 


coefficients in suggests independence 


scores and also tor\ 
We contend, too, however, t 
and ( 


scores and personal 


among tl sonal hi 


variables hat 
Merenda 
AVA 


? 
also 


} 1,] } } 
iarKe should conciude 


tnal 


istory variables 


n 


are independent of sales effectiveness 


since these correlations are approximately tl 


same low magnitude as 


ADI. 


those depo ited with 


Technique IV 


Report commonly calculated statistics 


as means and standard deviations to many 


decimal places so as to give the impression of 
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great care and precision in conducting the 
study. 

Merenda and Clarke report means and 
standard deviations to five decimal places. 
Several decimal places, of course, are neces- 
sary during the process of computing dis- 
criminant weights. Two decimal places are a 
great sufficiency, however, for the actual re- 
porting of final results. Additional decimal 
places at this stage simply take up space and 
make interpretations of statistical data less 
straightforward for the reader. This example 
may explain why Bennett (1959) in his re- 
view of the Activity Vector Analysis tech- 
nique, states: “The mumbo-jumbo of allegedly 
sophisticated statistical procedures is no sub- 
stitute for demonstrated validity.” 

Technique V 

Apply scoring weights (e.g., discriminant 
weights) directly to the group on which they 
were originally derived. 

This practice and its effects were well 
spelled out by Cureton (1950) when he dis- 
cussed “Validity, Reliability, and Baloney.” 
Applying scoring weights to the group on 
which they were developed capitalizes on the 
possibility of chance fluctuations in the data 
and gives the possibly false impression of 
higher validity than is warranted. Cureton 
calls this “baloney.” A large portion of the 
Merenda and Clarke article is devoted to a 
discussion of results obtained when apply- 
ing discriminant weights to AVA scores and 
to personal history variables. Merenda and 
Clarke examine various cutting scores in an 
effort to discover the most efficient combina- 
tion of predicting sales success. The various 
cutting scores found to be most efficient, their 
specific combination and overall validity for 
the set of predictors are all subject to in- 
stability and shrinkage on cross-validation. 
The practice involved in Technique V is with- 
out merit and there is little assurance that 
results will hold up when applied to inde- 
pendently selected samples of subjects. 

In the case of the Merenda and Clarke 
article, however, the practice should not re- 
sult in significantly deluding the sophisticated 
reader. They recommend combining a cutting 
score of O for the AVA discriminant score 
with one of 91 for the personal history dis- 
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criminant score. Based on frequencies pro- 
vided in the article, we find that this com- 
bination of scores yields a selection ratio 
of .67. This means that two-thirds of those 
“tested” with AVA would be hired. Entering 
the Taylor-Russell tables (Taylor & Russell, 
1939) based on an estimated validity of .20, 
we find that the use of AVA and personal 
history information would result in increasing 
the proportion of successful salesmen from .21 
to .24. In other words, even by maximizing 
all differences (including many probably due 
to chance) and by assuming perfect stability 
of cutting scores, only a 3% improvement in 
selection occurs. Such small improvement as 
this hardly seems to warrant the cost of using 
any selection device. 


Technique VI 


Use the term valid to refer to any differ- 
ence or correlation which achieves statistical 
significance. Try to avoid any connotations 
involving practical validity or the question of 
whether or not a device would prove prac- 
tically useful in an actual selection situation. 

Merenda and Clarke state the following 
conclusion, based on validity coefficients, none 
of which exceed .19: “The AVA is a valid 
predictor of success-failure among life insur- 
ance agents.”’ Few statisticians would be satis- 
fied with a validity coefficient of .19; even 
fewer would claim “validity” for an instru- 
ment or technique on the basis of such limited 
predictive utility. 


Technique VII 


Write a brief and easily understood sum- 
mary and conclusions. Be sure to state the 
conclusions positively and in line with what 
you hoped to show. 


If the conclusions offer a pleasing and easy- 
to-read-contrast with hard-to-understand sta- 
tistics in the body of the article, many readers 
will be tempted to take the author’s word for 


their content. Merenda and Clarke (1959) 
offer the following statement of conclusions: 


(a) The AVA is a valid predictor of success-failure 
among life insurance agents. 

(b) Certain personal history measures are valid 
predictors of success-failure among life 
agents. 


insurance 
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(c) Combining AVA and personal history data 
enhances the predictive efficiency of these measures in 
determining the success or failure of life insurance 
agents over a sustained period of time (p. 366) 

Based on the data presented in the article, 
we disagree with the “tone” of the above con- 
clusions. We believe the specific information 
given in this article does not merit calling the 
AVA a valid predictor of insurance selling 
success. We believe business men who receive 
reprints of this article and even some psy- 
chologists are going to be misled simply be- 
cause they read only the summary and con- 
clusions. 

We believe the article in question can be 
used as a negative teaching tool, since it pro- 
vides a series of illustrations of what not to 
do in scientific reporting; as such, it may in 
the long run serve a more beneficial purpose 
than one would usually predict for such an 
irticle. 
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Dunnette and Kirchner suggest that our 
article is an example of “colored” reporting 
of a research study. We submit that their 
critique is actually the example of such biased 
reporting. 

Their presentation is biased right from the 
start. They begin by emphasizing the high 
attrition rate among insurance salesmen and 
implying that this phenomenon is not stand- 
ard and universal—which it is. We made this 
point clear in the introductory paragraph of 
our article. Then they fail to say anything 
about the selective nature of our sample which 
caused the restriction in range in the groups 
being studied. This error of negative instance 
occurs when they fail to make reference to 
the study which is basic to the article being 
studied. In this basic report (Merenda & 
Clarke, 1959a) it is obvious that the sample 
of the study in question is highly attenuated 
with respect to the personality profiles. We 
quote from the text of this article: 


Inasmuch as this company was a client during this 
period, and since a great majority of the general 
igents employed the AVA in their selection kit, the 
sample can be expected to be a highly restricted one 
in terms of AVA profiles. . . . Hence, the findings 
of this study must be considered in the light of the 
restricted nature of the subjects used 


In their first drafts of this critique Dun- 
nette and Kirchner erroneously concluded that 
the individual rp’s were being reported as 
separate validity coefficients. It was we who 
advised them that the appropriate measure of 
validity for the AVA set would be multiple R 
derived from a modified discriminant function 
analysis (Fisher, 1936). But we also advised 
them that due to the restriction in range on 
the AVA a multivariate correction needed to 
be made (Gulliksen, 1950) to estimate multi- 
ple R for the attenuated sample. It is likely 
that it was not possible for them to make the 
correction, but this is no excuse for their fol- 
lowing only one half of our suggestion and 
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not doing anything (let alone not saying any 
thing) about the other half. 

Inclusion of data relative to the 15 personal 
history variables which were not employed as 
predictor variables would have taken up too 
much space in the journal, and certainly 
would have been omitted by the editors as 
they omitted Table 5 by advising us to de- 
posit it with the American Documentation In- 
stitute (ADI). 

With reference to the variables actually 
used in the predictor sets, it was found that 
in the light of the restricted nature of the 
sample, the differences in group means rela- 
tive to the variances appeared to have dis- 
crimination value. For the 15 personal his 
tory variables excluded as predictors in the 
PH set, it was found that practically no mean 
differences existed. As for the 
data of Table 5, no statistical 
tests were made by us. These values were 
calculated by Dunnette and Kirchner. We 
merely used the intercorrelational values as 
a guide in deciding whether or not to combine 
the personality and personal history data into 
one predictor set or to use each set inde- 
pendently. The results of the significance tests 
would not have altered our decision. 


correlational 
significance 


In previous rejoinders which we have pre- 
pared to the first two drafts of this critique 
by Dunnette and Kirchner, both of which 
have required revisions by the authors, we 
explicitly pointed out (with documentation) 
that Table 5 was submitted by us along with 
the other tables in the original manuscript 
sent to the Journal of Applied Psychology 
However, the editor along with his consulting 
editors felt the data of the table were not 
particularly germane to the study so they ad 
vised that we deposit this table with ADI 
Hence, the authors’ remarks on this point do 
not apply to us, but rather to the editors of 
the Journal of Applied Psychology. Dunnette 
and Kirchner are well aware of this fact; yet 
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they persist in inferring in their article that 
we deliberately deposited the table with ADI 
in order to hide the data. This is certainly 
another example of colored reporting on their 
part. Furthermore, we wish to state that of 
21 articles on the AVA published by us, either 
jointly or singly, in only one other instance 
was a table deposited with ADI. This was the 
intercorrelation matrix of the 81 AVA words 
which could not be feasibly handled in the 
journal space. 

With reference to the comments regarding 
decimals, we kept the five places in our tables 
simply because this was the form in which 
they came off the computer, and we found 
that we could do so without taking up addi- 
tional publication space. This permits all the 
data to be consistently reported (at least five 
decimal places were required for the linear 
discriminant weights). The number of decimal 
places to which the distribution data are taken 
does not affect the difficulty of interpretation 
as the critics claim. In our review of their 
critique, we observe that it is not the num- 
ber of decimal places to which the statistics 
were reported that caused Dunnette and 
Kirchner to erroneously interpret the data of 
our tables. 

We agree with Dunnette and Kirchner that 
it is more desirable to cross-validate the dis- 
criminant weights than to apply the dis- 
criminant weights directly to the group on 
which they were originally derived. We ap- 
plied the discriminant weights back to the 
same subjects of the study because at that 
time we were engaged in a concurrent validity 
investigation (at least as far as the PH set 
was concerned). This point was made to and 


accepted by the editor who was responsible 
for publishing the article 
was made, applying the discriminant weights 
to an independent sample of 535 life insur- 
ance salesmen. The results were upheld, and 
a report of this cross-validation study has 
been published in this same journal (Merenda, 


A subsequent study 


Hall, & Clarke. 1961). Hence, the critics’ 
judgment concerning the likely instability and 
shrinkage upon cross-validation, and that the 
“practice [of applying discriminant weights 
directly to the group upon which they were 
originally derived] is without merit and there 
is little assurance that results will hold up 
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when applied to independently selected sam- 
ples of subjects,” is not only incorrect, but 
unwarrantedly misleading to the reader, be- 
cause they were aware of the cross-validity 
study and its results when they prepared this 
critique. Yet they not only fail to mention the 
cross-validity study, but they make it appear 
to the reader that such a study would only 
lead to results which are opposite to the ones 
they already knew existed. Is this not a clear 
example of colored reporting and suggestive 
of the value of their judgment? 

That :19 is a rather low validity coefficient 
is not denied. However, as we have indi- 
cated previously in this article, Dunnette and 
Kirchner were in error when they originally 
assumed that the rp’s of Tables 1 and 3 of 
the article in question (Merenda & Clarke 
1959b) were validity coefficients. We made 
no such claims in our discussion and it would 
be improper to do so. Also consider the illogic 
of the reporting by Dunnette and Kirchnet 
wherein they persist in insisting that the un 
corrected rp’s are our validity coefficients 
after they had apparently accepted our ear 
lier rebuttal on this point and changed their 
critique accordingly (see their Technique 1) 

The demonstration of validity lies in the 
significant discriminations between the suc 
cessful and unsuccessful groups of agents in 
Tables 2 and 4 (Merenda & Clarke, 1959b) 
and in Tables 5 and 7 (Merenda, Hall, & 
Clarke, 1961) and for the AVA alone, from 
the data of the basic report (Merenda & 
Clarke, 1959a) interpreted in the light of the 
attenuation in the sample. That significant 
discrimination exist even with these 
highly selected groups is acknowledged by the 
editors’ comments to the authors to the effect 
that the major discrimination by the AVA 
occurs at the extremely lower scoring range 
It will be noted that we carefully pointed this 
out in our revised text (1959b, p 

As for practical validity, this is clearly ap 
parent from the data of Table 7 and our dis 
cussion of these data (1959b, pp. 364 
of which our critics make no mention in their 
critique. We assure the reader that the 
pany in which the research was conducted 
jointly by us and their research staff, con 
sidered these results to be indicative of prac 
tical validity. On the basis of these findings 


does 


363) 


365) 


com 
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and the later cross-validity findings, the sys- 
tems were made operational for screening ap- 
plicants and rejecting those with low success 
prediction. Also, on the basis of their actuarial 
acumen and greater knowledge of recruit- 
ment, training, and maintenance costs, the 
company research staff extrapolated the re- 
sults of the concurrent validity research and 
concluded in their own internally distributed 
technical report that if it (the proposed Ap- 
plicant Rating System) had been used in se- 
lecting the 522 agents in the sample, net gains 
in new business for their first 3 years would 
have been over $28,000,000.00. 

The summary and conclusions are briefly 
and clearly stated. That they are tenable from 
the data of our study should now be rather 
clear to the reader of this rejoinder. We hold 
that ‘the conclusions drawn from the data of 
this study are tenable in much the same way 
that the validity of college entrance tests is 
established. While the validity coefficients of 
such selection devices against long-range suc- 
cess criteria are relatively low, the tests are 
viewed as being valid for selecting college 
freshmen since those with low scores have 
little or no chance of completing 4 years of 
college. Similar validity may be inferred, we 
feel, from the data of our study beyond that 
which is actually empirically demonstrated. 
We base this opinion on the assumption that 
the substantial number of applicants with 
personality profiles which are incompatible 
(negatively correlated) with the hypothesized 
“best” profile and who were rejected on that 
basis initially (therefore never becoming a 
part of the sample) could have been predicted 
with a probability of near certainty of becom- 
ing failures by the end of the third year after 
hire. In the attenuated sample 40 out of 42, 
or 95%, of such cases proved to be unsuc- 
cessful “career” life insurance salesmen. 

We had no intention of misleading the 
reader, and we disagree with Dunnette and 
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Kirchner that businessmen or psychologists 
who read the article are likely to be misled. 
Our report of this investigation was submitted 
to the Journal of Applied Psychology in ful- 
fillment of our professional obligation to psy- 
chologists. Other nontechnical reports have 
been prepared both by us and our other col- 
leagues who either collaborated with us in this 
joint research effort or who were profession- 
ally interested in this rather extensive longi- 
tudinal study of selection procedures for life 
insurance agents. All of these persons reviewed 
and approved the prepublication manuscript 
of the article in question. These professional 
research personnel (psychologists, economists, 
and market analysts) all felt the conclusions 
were tenable after a careful review of all 
phases of the study extending over a period 
of 2 years. In fact, one of the nontechnical 
reports on our study prepared for and dis- 
tributed to lay personnel of life insurance 
companies by the research department of a 
trade association for the life insurance indus- 
try states that they had examined the study 
from a technical viewpoint and found it to be 
sound. 
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